

# JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY: KAKINADA KAKINADA – 533 003, Andhra Pradesh, India DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

| III Year - II Semester | L | T | P | C |
|------------------------|---|---|---|---|
|                        | 3 | 0 | 0 | 3 |
| VLSI DESIGN            |   |   |   |   |

#### **OBJECTIVES:**

### The main objectives of this course are:

- To learn the MOS Process Technology
- To understand the operation of MOS devices
- Understand and learn the characteristics of CMOS circuit construction.
- Describe the general steps required for processing of CMOS integrated circuits.
- To impart in-depth knowledge about analog and digital CMOS circuits.

### UNIT-I:

**INTRODUCTION AND BASIC ELECTRICAL PROPERTIES OF MOS CIRCUITS:** VLSI Design Flow, Introduction to IC technology, Fabrication process: nMOS, pMOS and CMOS. I<sub>ds</sub> versus V<sub>ds</sub> Relationships, Aspects of MOS transistor Threshold Voltage, MOS transistor Trans, Output Conductance and Figure of Merit. nMOS Inverter, Pull-up to Pull-down Ratio for nMOS inverter driven by another nMOS inverter, and through one or more pass transistors. Alternative forms of pull-up, The CMOS Inverter, Latch-up in CMOS circuits, Bi-CMOS Inverter, Comparison between CMOS and BiCMOS technology, MOS Layers, Stick Diagrams, Design Rules and Layout, Layout Diagrams for MOS circuits

### UNIT-II:

**BASIC CIRCUIT CONCEPTS:** Sheet Resistance, Sheet Resistance concept applied to MOS transistors and Inverters, Area Capacitance of Layers, Standard unit of capacitance, some area Capacitance Calculations, The Delay Unit, Inverter Delays, driving large capacitive loads, Propagation Delays, Wiring Capacitances, Choice of layers.

**SCALING OF MOS CIRCUITS:** Scaling models and scaling factors, Scaling factors for device parameters, Limitations of scaling, Limits due to sub threshold currents, Limits on logic levels and supply voltage due to noise and current density. Switch logic, Gate logic.

### UNIT-III:

**BASIC BUILDING BLOCKS OF ANALOG IC DESIGN:** Regions of operation of MOSFET, Modelling of transistor, body bias effect, biasing styles, single stage amplifier with resistive load, single stage amplifier with diode connected load, Common Source amplifier, Common Drain amplifier, Common Gate amplifier, current sources and sinks.

### UNIT-IV:

### CMOS COMBINATIONAL AND SEQUENTIAL LOGIC CIRCUIT DESIGN:

**Static CMOS Design:** Complementary CMOS, Rationed Logic, Pass-Transistor Logic. **Dynamic CMOSDesign:** Dynamic Logic-Basic Principles, Speed and Power Dissipation of Dynamic Logic,

Issues in Dynamic Design, Cascading Dynamic Gates, Choosing a Logic Style,



# JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY: KAKINADA KAKINADA – 533 003, Andhra Pradesh, India DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

Gate Design in the Ultra Deep-Submicron Era, Latch Versus Register, Latch based design, timing decimation, positive feedback, instability, Metastability, multiplexerbased latches, Master-Slave Based Edge Triggered Register, clock to q delay, setup time, hold time, reduced clock load master slave registers, Clocked CMOSregister. Cross coupled NAND and NOR, SR Master Slave register, Storage mechanism, pipelining

### UNIT-V:

**FPGA DESIGN:** FPGA design flow, Basic FPGA architecture, FPGA Technologies, Introduction to FPGA Families.

**INTRODUCTION TO ADVANCED TECHNOLOGIES:** Giga-scale dilemma, Short channel effects, High–k, Metal Gate Technology, FinFET, TFET.

### **TEXTBOOKS:**

- 1. Essentials of VLSI Circuits and Systems Kamran Eshraghian, Douglas and A. Pucknell And SholehEshraghian, Prentice-Hall of India Private Limited, 2005 Edition.
- 2. Design of Analog CMOS Integrated Circuits by BehzadRazavi, McGraw Hill, 2003
- 3. Digital Integrated Circuits, Jan M. Rabaey, Anantha Chandrakasan and Borivoje Nikolic, 2<sup>nd</sup> edition, 2016.

### **REFERENCES:**

- 1. "Introduction to VLSI Circuits and Systems", John P. Uyemura, John Wiley & Sons, reprint 2009.
- Integrated Nanoelectronics: Nanoscale CMOS, Post-CMOS and Allied Nanotechnologies Vinod Kumar Khanna, Springer India, 1<sup>st</sup> edition, 2016.
- 3. FinFETs and other multi-gate transistors, ColingeJP, Editor New York, Springer, 2008.

### **OUTCOMES:**

### At the end of this course the student will be able to:

- Demonstrate a clear understanding of CMOS fabrication flow and technology scaling.
- Apply the design Rulesand draw layout of a given logic circuit.
- Design MOSFET based logic circuit.
- Design basic building blocks in Analog IC design.
- Analyze the behaviour of amplifier circuits with various loads.
- Design various CMOS logic circuits for design of Combinational logic circuits.
- Design amplifier circuits using MOS transistors.
- Design MOSFET based logic circuits using various logic styles like static and dynamic CMOS.
- Analyze the behaviour of static and dynamic logic circuits.

# UNIT - 1

Introduction & Basic Electrical Properties of Mos exts:-VIST - Very Large Scale integration Definition of Ic:-

an well be plad in

To is an electronic for integrated circuit and may be given as combination of active of possive. elements that are integrated on signle silicon chip. As there are Several advantages of using Silicon which includes, if acts as good insulating material, Oxiding material.

Ave made using silicon only [i.e., 90.1.] Trends in Micro electronics:-

The electronics now a days available in the market are categorized by reliability, Size, weight, Volume, cost.

In addition these the VISI technology made an advantage to have a more powerful & flexible processor for availing a good source.

\* The BJT was first invented by william shockley & John barden in 1947 at bell laborataries.

\* up to 1950's the BJT technology was dominated by vaccum tubes.

\* The 1st Ic technology was developed in 1960's and there by a revolutionary come into the electronics industries.

1

Symbols of Mosfer:-1 11-1) and the fail of the state of the second VISI Technology BJT Hosfer . to to ma n-channel P-channel sho li Depletion enhance Depletion - ment mode mentmode mode mode so wail place assilt price above so N-Mos P.Mor <u>G</u>\_\_\_\_\_ <u>L</u>\_\_\_\_ <u>L</u>\_\_\_\_ <u>L</u>\_\_\_\_ <u>L</u>\_\_\_\_ <u>L</u>\_\_\_\_ <u>L</u>\_\_\_\_ <u>L</u>\_\_\_\_ <u>L</u>\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_\_\_ <u>M</u>\_\_ Gid Depletion mode · O mark De current. lorn's init: BJT is a Current Controlled device where as Mosfet is a Voltage Controlled dévice. In r substantiant instantials

2

Level's of integration:- in the time is mind have Depending upon the complexity of integrated CKE the classification Can be given as SSJ (Small Scale integration) - 10 to 100 MS.I ( Medium Scale integration) - 100 to 1000 -ISI (large Scale integration) - 1000 to 105 ULSI (very large scale integration) - 105 to 10 strain all al The upcoming technology i.e., ULSI (ultra large scale integration) - 10 to 100 Difference Between BJT and MOSFET. MOSFET 2 BJT 1. It is Voltage control 1. It is current control dévice. device. D. Droin and Source a. collector and emitter terminals ore interchange. -lerminal's are not inter 3. 3. Ompédance is high. change. 3. Impedance is low. 4. Ilp Impedance is high 4. Ilp Impedance, is low 5. Gransconductance is 5. Trans conductance is, il tow wid give the busil information ballen hìgh. \* Why it is called FET? A) GID Vgs = Voltage blue gate and source Vgs ]s terminols. By the external application of vgs they is an electric field development blus gate and source .

3

and hence it will get effected in correspondy with vgs. Hence it is called field effect Transistor (FET). The IC EJIA:-\* The first Ic emerged in the early 1960's. \* Depending upon the potential of that Ic. we Can find no of transistors that are being integrated in the single silicon chip. \* In less than 3 decodes the no of transistors count has rasin form loo to loop millions of transistors per chip. No of transistors per chip 10,000m 1000mtoom . (Om m 1997 2000 2003 2007 Moore's low: -The graphical Depresentation that gives the relation - ship blu the year. V/s no. of transistor per chip is called moora's law. No of transistors supply channel per chip voltage length lum) 30 0.3 10000m 0.25 2.50 1000m 1.5V U.I. toom iom ' Im

www.Jntufastupdates.com Scanned by CamScanner



5



+ + to to to to voy 3-14m thickness  $\frac{1}{2} = \frac{1}{2} = \frac{1$  $\wedge \wedge \wedge \wedge \wedge$ 

step 4:- The uncovered portio. of mask allows uv light to flow through it, the oxide layer will be get softened and Jumain Covered position will be remain harden.

and the process is Called "Etching"

New March

7

| 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1                                                  | the promition the first |
|-------------------------------------------------------------------------------------------|-------------------------|
| $\wedge$ | in alman i ama          |
|                                                                                           | and the second          |

Step 5:- An oxide layer of orlum thickness is grown and using polysilicon and metal gate terminal can be extracted.



Step 6: To diffuse n-type of impurities into p-type of substrate the masking and Etching processes are again Carried Out

Stepting The uncovered portions of mask will be get softend and their by surmoved using Etching Process. The n-type of impusities are diffused into the P-type of Substrate as shown below. 1994 in the through it is a point of i the within the many of the A A B A A A A A A A A A steps:- From the n-type of impurities drain and Source terminale Source terminals can be extracted using polysilicon ond metals. ~ ~ ~ ~ ~ ~ ^ ~ ~ ^ ~ ~ ~ ~ ~ ~ ~ 0) P-Mos fabrication Process:-P-Q-P P-diffusion Substrate step 1:- for the designing of P. Mos transistor we have to consider n-type of substrate materials mine -1-1-1-1--1-1-01-1--1-1-

step 2:- To improve the quality and protection on oxide layer of lum thickness is grown overall surface of n-substrate. Jum thickness -1-10-1-1--1-10-1-1-1-100 14-1step3:- A photoresistive loyer is grown at the top of oxide loyer and it is exposed to uv light through Suitable Masking. 1 J. J. J. J. J. Wrays -> photoresistivity -1-1-101-1-1-1 -1-1-1-1-1-step 4:- The uncovered position of mask allows uv light to flow through itr the oxide layer will get soften and giemain covered portion will remain harden. The portion which is soften will be removed and this process is called Etching. Step51- An Oscide layer of Orlum thickness is grown and using polysilicon and metal gate terminal can be exerted.

C Dais steps:- To diffuse p-type of impunities into n-to of substrate - the Masking and eteching process ar again Carried out. G WWW 01-1-1-1 Step 71- The uncovered portion of Mask will be get soften and there by removed by using "Etching" Process. The n-type of impurities are diffused into P-type of substrate as shown in below. 1-1-101-1-1--P P-1-1-1-1-1 steps:-from the p-type of impurities drain and Source terminale can be extracted by using polysilium and metals.

CHOS - Investor --\* CHOS is known as complementary metaloxide semi -totion and it will produce output has complemen -tation of input. -\* The CHOS can be designed with the help of PHOS \* NHOS transiston are faster than pmos devices because the Hobility of electrons are greater. compare to Mobility of holes. i.e., en = a.5up CHOS Invertor Circuiti-- Vpp Vin Vo/p Q, 92 BI \_\_\_\_ Demargation D DN OFF 1 OFFLON -ovb/p 1 D 92 GND operation :-Case (1): - when Vin is Zero when the input is logic o then the transistor Q, Will get on, Q2 will get off there by producing Vo ai logic'i. Case (2): - when Vin is logic 1.

When the Input is logic 1 then Q, will get turn Off, Q2 will get turn on, tence the Vo as logic'o!

fabrication of crossfor the fabrication of CMOR we have different Types 1. Cros using p-well process D. Cros using N-well process 3. Twin - Tub process Chos fabrication wing N-well processistep 1:- for the designing of CMOS using N-were process we have to consider a p-type of substrate. step 21- Diffuse N-type of substrate (N-well) into p-type of substrate. A A A A A steps:- grow oxide layer on the surface of p-type Substrate of Lum thickness Vic Mi India 1 A A A A TOTT Flum thiox A A A A A A



I av a tar a 1. 1. n-type and p-ty Steps:- To have diffusions 1 1 14 of MONAN OF OFFICE AAAA AAAAAA Step9: finally Furthere C-Mos the 1/p and olp terminals can be extracted as like shown below. 9 Vin 9VD - 455 Vpp AAAAA T-1-1 ΛΛΛΛΛΛΛΛΛ A A 2) C-Mos fabrication using p-well process:-Step 1: For the designing of C-MOS using P-well process we have to Consider and n-type substrate.

- (-) same n' all' s pol Steps:- diffuse p-type substrate (p-wen) into n-lype of Substrate. Step 3:- grow oxide layer on the surface of P-type of substrate of lum thickness. multiple and the film Stepy:- To improve the protective ness or Photo siesistive layer is grown and it is exposed to uvlight through suitable masking. Photoresistive laye1 1-1-1-1

Step 5: The uncovered portions of -maskwill de Soften and removing the " etching process. REAL ONLY CODE A A A -1-1-A A A -1-1-A A A -1-1-Step 61-An oxidé layer of orlum is grown on the top of P-type substrate. AAAA NA.P 2.1. Step 71- The gate terminal can be extracted foom p and n-type of the substrate using Polysilicon Metal I- 1 1 -1-1 -1- 10 1 -1-1 VVV

stepsi- To have diffusion p-type and n-type into Corresponding NE p type of Substrates Repeat -fhe steps from (1) to (6). 6 step 9: finally for the cmos the i/p and o/p terminals extracted as like shown below. 9 Vin VDD 900 USS 0.2 - DIMAN bottont va Section of input inpedence and ,G Diorn had an and had been mind hather was in and it in more silois daine a contribution signed and LOW TO THE bloor all surfleet appropriate all a choid a vale on loor 2 lobrition pathicharing anto none for solution

(Mos fabrication using Twin-Tub process!-Mos using twin - tub process is the logical extension P-well Line desiging Can k of P-well and N-well process. the desiging can be Carrie and N-well process. the desiging can be Carrier out by taking a high mestivitive n-type of Substrated Substrate. Here the design is consider such that The performance the design do not comprise the The performance of p-well do not comprise the Performance of p-well do not comprise the Performance of ri-well using Epitaxial layer. Hence doping level is steadily acheieve. 9 Vin VDD Vss 9 Vout N A(P) sepilaxial layer un : Comparision blu cruos technology and Bipolar technol S.NO Bipolar technology CMOS technology 1. It is a current 1. It is a Voltage controlled controlled device. device. Low input impedance Q. High input impedance and and high output drain low output drain current. Current. 3. Low static power dissipation High static power dissipation 4. Scable threshold voltage threshold voltage may depend on type of Semiconducting materials Parameters. on device

These are essentially uni Bidirectional Capability 5. that is drain and source - directional, had always the ter of terminals Can be inter " change. low transconductance. High transconductonce. 6. 1.e., Im is directly that is Im devin perportional to /impedore Jm & \_\_\_\_\_\_ Impedance. low package density 7. High package density. 8. low Voltage Swing thigh noise margin a pail- 113 levels and that pairs of S Section 2 CAR Enhancement mode Mosper: [nmos] While desinging N-Mos we have to take p-type Substrate and it has two n-type of diffusions to form drain and source. terminals for extracting gate, drain add Source terminals polysillicon and metal contacts are use for necessary deads. Here we have to apply a suitable positive voltage at gate terminal to create channel blue drain -the and Source SIC DELL 10 asila purctional and What TO ballado 1 11 () has the grant is 13/301 and for

19

operation of NMOSI-The designing of Nimos includes Considering a P-type of substrate and to heavily doped nt imput are docc are diffuse into p-type of substrate to get drain and Source and Source terminals. Case (1): -When ygs=ou when vois = ou no channel will be form and no Current Condition takes place. \* Here when Vgs = 0 we can find two junction diodes that are connected in series back to back manner in blue drain and source and these two diodes are in neverse bias Condition. and the second of the second of the it is a second s the second from the second and show hat  $\begin{array}{c} & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\$ (ase (2):- when vgs >0v, means we are applying Some possible voitage at the gate terminal then the majority charge carries in N-mos that is holes will get reppled by an amount of Voltage applied at the gate terminal. \* IF we increase the gate potential step by Step then the holes in the substrate will get www.Jntufastupdates.com Scanned by CamScanner 20

rippled and pushed down leaves a depletion region blu drain and Source. Hence it is called charge Inversion layer. In the start of antiat the tomoto to the tev not \* After having a charge inversion layer the holes in the gale terminal will get attracted to not diffusions present in droin and source Terminals, Thereby forming N-channel blue drain and Source. \* Alote: - The voltage at which charge inversion layer forms and the gate terminal can be inverted is called therehold voltage of mass device. Asi G' D . . . . . . . . . . . . . . . . and the set of the set and man by Ann O ANN is to the cage (3):- when vgs ZVE , VDS = 0. No channel will be form and hence no current Conduction takes place i.e., Pos=0 Case (4): - Vgs > VE, VDS = 0 When Vgs > Vt then Channel will be form but no current Conduction takes place i.e., Culloff Jegion. Jegion. Active region -> current conduction takes place

Saturation Diegion -> Acts as constant current source Case(5):- when vgs >VL, Vds EVg all grantest When Vgs >ve channel will be form, when UDS = Ug (VDS = Ugs - VE) - then near drain terminal Their is an insufficient electric field and it is in non-saturation Condition. G Promin 1 no wint S The state of the second state 20.00 pinch off 111 pollo 2 Section 1. A A A A A A mahert NAN ONNAN Case (G):- when. vgs >Vt, Vds >vg When Vgs > Vt, Vds > vg (vds > vgs - Vt) then channel Will be form and as vos as raised greater than Vgs-Vt their is insufficient electric field near the drain terminal which causes the channel NO Channel UNI pinchoff Condition share page the notion 20V ( -: (P) -: (P) -> pinch off NANA Corrend of n n nD Solve Bregion & Current Coursellon Make

Here when Vgs7Vt, Vds7Vgs-Vt it is in Saturation mode and hence acls as constant current Source. Nole:- for an enhancement mode n-most the without will be created by opplying positive channel will be created by opplying positive Voltage at the gate terminal. \* IF It IS p-Mos, to create channel we need to apply negative voltage at the gate terminal.

Depletion Mode N-MOS:-In Depletion mode MOSFET It is having an inbuilt channel that is no need to apply External Voltage.

In this depletion mode nomos the channel will be created blue drain and source prior to Manufactoring stage before applying insulating and metals.

ed much author of

- muss bao alorb

Body Mass effect:-Note:- Depletion mode N-mos is always ON. If you Want to remove channel, we need to apply the Negative voltage at the gate terminal.

for N-mos, basically the source voltage and Body Mass Effect :-Body potential should be equal that is Vs=VB. If VB==0 i.e., It is having some in built possible Potential thon it couses an effect on these should in include it includes an effect of the lower is included it includes the second second in the second sec Voltage hence threshold voltage level is increase to available threshold voltage to avoid this the body is connected to suitable. Nearly negative voltage w.r. to source hence this effect is called Body Mass effect.  $d \to d \to \pi$ G P AN ANDANA A AAAAAA VBB Relationship blue Ids Us Vds :-The whole concept of mass transistor revolves the application of Vgs which inturn causes to «Create a channel blw drain and source. : The current Ids is a dependent on both Vgs Morrow Depletion proje Dunes " an and Vas. " on sub-land Service veltage at the gale 's miner.

The current Ids can be given by  $\Box_{d3} = - \Box_{sd}$  $: I_{ds} \simeq \frac{Q_c}{\widetilde{L}_{de}} \longrightarrow (1)$ where Rc is charge and Tas is electron transit time we know that  $T_{dS} = \frac{L}{V} \longrightarrow (2)$ where L= length of the channel and V = Velocity The velocity 'v' can be given as  $U = \mathcal{U}(\mathbf{E}_{ds} \rightarrow (3))$ where is called mobility constant Eds = Effective electric field b/w drain and source.  $E_{ds} = \frac{V_{ds}}{L} \longrightarrow (4)$ Sub () in (3)  $V = u \frac{V_{dS}}{J} \longrightarrow (5)$ Sub 6 in 2 - bill of hair and  $ds = \frac{L}{19}$  $T_{ds} = \frac{L}{\mu V_{ds}} + V - (V - 2V) = 0$  $\gamma_{ds} = L^2 / \mu V_{ds} \rightarrow (6)$ 

Sub (c) in (c)  

$$T_{ds} = \frac{Q_{c}}{Y_{ds}}$$

$$T_{ds} = \frac{Q_{c}}{W_{ds}}$$

$$Case(j) = Non - Saturation$$
when it is in non - Saturation then the effective  
when it is in non - Saturation then the effective  
woltage is  $\frac{V_{ds}}{2}$ .  
The charge  $Q_{c}$  can be given as  
 $Q_{c} = Eg E_{0} E_{ins} w L \longrightarrow (i)$   
wheshe  $E_{g}$  is effective gate voltage  
 $E_{0}$  is permittivity of free space  
 $E_{0} = 8 \cdot 854 \text{ xib}^{2} F/m$   
 $E_{ins} = Jelative Permittivity$   
 $E_{ins} = 4 (for silicon)$   
 $w = width$   
 $W = W_{ds} - Vt$   
 $E_{g} = [(V_{gs} - V_{t}) - \frac{V_{ds}}{2}]$   
 $Where D = 0xide thickness$ 

$$\begin{aligned} & \text{Re} = \left[ \left( V_{\text{A}} s - V_{\text{E}} \right) - \frac{V_{\text{A}} s}{2} \right] \text{ So Eins usl.} \\ & \text{Re} = \left[ \left( V_{\text{A}} s - V_{\text{E}} \right) - \frac{V_{\text{A}} s}{2} \right] \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} - \mathcal{I}^{(3)} \\ & \text{WKT} \end{aligned} \\ & \text{Ids} = \left( \frac{1}{\sqrt{3}} s - V_{\text{E}} \right) - \frac{V_{\text{A}} s}{2} \right] \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{1}{\sqrt{\sqrt{3}}} \\ & \text{Ids} = \left( \left( V_{\text{A}} s - V_{\text{E}} \right) - \frac{V_{\text{A}} s}{2} \right) \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{V_{\text{A}} s}{L^{2}} \\ & \text{Ids} = \left[ \left( V_{\text{A}} s - V_{\text{E}} \right) - \frac{V_{\text{A}} s}{2} \right] \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{V_{\text{A}} s}{L^{2}} \\ & \text{Ids} = \left[ \left( V_{\text{A}} s - V_{\text{E}} \right) - \frac{V_{\text{A}} s}{2} \right] \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{V_{\text{A}} s}{L^{2}} \\ & \text{Ids} = \left[ \left( V_{\text{A}} s - V_{\text{E}} \right) \frac{V_{\text{A}} s}{2} - \frac{V_{\text{A}} s}{2} \right] \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{V_{\text{A}} s}{L^{2}} \\ & \text{Ids} = \left[ \left( V_{\text{A}} s - V_{\text{E}} \right) \frac{V_{\text{A}} s}{2} - \frac{V_{\text{A}} s}{2} \right] \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{V_{\text{A}} s}{L^{2}} \\ & \text{Ids} = \left[ \left( V_{\text{A}} s - V_{\text{E}} \right) \frac{V_{\text{A}} s}{2} - \frac{V_{\text{A}} s}{2} \right] \\ & \text{Ids} = \frac{\mathcal{E} \text{ins usl.}}{L} \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \\ & \text{Ids} = \frac{\mathcal{E} \text{ins usl.}}{L} \frac{\mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{\mathcal{E} \circ \mathcal{E} \circ \mathcal{E} \text{ins usl.}}{D} \frac{\mathcal{E} \circ \mathcal{E} \text{ins u$$

$$T_{ds} = \frac{c_{g}}{\omega_{L}} \frac{\omega}{\omega} \left[ (v_{gs} - v_{t})v_{ds} - \frac{v_{ds}^{2}}{2} \right]$$

$$T_{ds} = \frac{c_{g}}{L^{2}} \left[ (v_{gs} - v_{t})v_{ds} - \frac{v_{ds}^{2}}{2} \right]$$

$$\omega_{kT} C_{g} = c_{0} \omega_{L}$$

$$T_{ds} = \frac{c_{0}}{\omega} \frac{\omega}{L^{2}} \left[ (v_{gs} - v_{t})v_{ds} - \frac{v_{ds}^{2}}{2} \right]$$

$$Case(a): - Saturation \left[ v_{ds} = v_{gs} - v_{t} \right]$$

$$T_{ds} = \frac{k\omega}{L} \left[ (v_{gs} - v_{t})v_{ds} - \frac{v_{ds}^{2}}{2} \right] \rightarrow (i)$$

$$\Rightarrow Saturation starts at v_{ds} = v_{gs} - v_{t}$$

$$\Rightarrow T_{ds} = \frac{k\omega}{L} \left[ \frac{v_{ds}^{2}}{2} \right]$$

$$T_{ds} = \beta \left[ \frac{v_{gs} - v_{t}}{2} \right]$$

29

While 
$$C_{ij} = \frac{k\omega_{ij}}{\lambda} \implies k = \frac{c_{ij}}{\omega_{ij}}$$
  

$$\Rightarrow Id_{s} = \frac{c_{ij}}{j\omega_{ij}} \frac{\omega_{ij}}{L} \cdot \left(\frac{(\omega_{ij} - v_{ij})^{2}}{\omega_{ij}}\right)$$

$$\boxed{Id_{s}} = \frac{c_{ij}}{L^{2}} \left[\frac{(v_{ij} - v_{ij})^{2}}{\omega_{ij}}\right]$$

$$\boxed{id_{s}} = \frac{c_{oij}}{L^{2}} \left[\frac{(v_{ij} - v_{ij})^{2}}{\omega_{ij}}\right]$$

$$\boxed{Id_{s}} = \frac{c_{oij}}{L} \left[\frac{(v_{ij} - v_{ij})^{2}}{\omega_{ij}}\right]$$

$$\boxed{Trans Conductance} (g_m)$$

$$Trans Conductance (g_m)$$

$$Trans conductance is defined as the relationship between Dutput Current 'Ids' and input. Voltage V_{ijs'}$$
.
$$\therefore g_m = \frac{\delta Id_{s}}{\delta v_{ijs}} |_{vas} = constant$$

$$\frac{\omega_{kT}}{Id_{s}} = \frac{Q_{c}}{\frac{Td_{s}}{1}}$$

$$\frac{Td_{s}}{Id_{s}} = \frac{Q_{c}}{\frac{Td_{s}}{1}}$$

$$\frac{Id_{s}}{\delta Id_{s}} = \frac{\delta c_{ij} mv_{ijs}}{\omega_{ij}}$$

$$\frac{Id_{s}}{\omega_{ij}} = \frac{\delta c_{ij} mv_{ijs}}{\omega_{ij}}$$

$$\frac{\delta Id_{s}}{\omega_{ij}} = \frac{\delta c_{ij} mv_{ijs}}{\omega_{ij}}$$

WKy Re Vgs Re Cg Sugs = Sac · · Jm = SIds Svgs Skicuvas L2 Soc Cg gm = CguVdsWKT Cg = COWL : Jm = CowK HVds 1. L2/ gm = Cowurds Output Conductance [Ids]:-The output Conductance das is defined as the relationship between pulput current Ids and input Voltage Vgs. 16-14-586 = 26-3

 $w_0 = u v_{dg}$ Since Vas = Vgs - VE  $U_0 = \mu (v_{g_1} - V_1)$ No mos Inverter-The A-mos inverter is more oftenly use and it Can produce full amount of logic levels. Description :-\* The arrangement of nmos Ri - Depletion mode Inverter consists of depletion mode and enchancement mode transistors. R2 - Enchancement Vin \* Here The gate terminal mode of depletion mode noos is L. Vss Connected to the drain terminal of enchancement mode nmos. of share the best open \* The depletion mode transistor is always on because of the inbuilt channel. Truth Table :-Operation :-Case (1):-Vin 92 Vo 91 when Vin is logic'i' DFF 1 ON 0 When Input is logic 1 the D ON ON transistor B2 turns on and transistor B1 turns on because it is depletion mode





pullup to pulldown ratio of nmas-muester driven by  
another nmos inverter:  
The arrangement of pullup to pulldown ratio for  
one nmos invertes driven by another nmos is  
shown as below.  

$$V_{DD} = V_{INV} = V_{Vouts}$$
  
Lok T  
I ds for saturation mode  
 $I ds = \frac{k \omega}{L} \left[ (Vgs - Vt)^2 \right]^2$   
for depletion mode vgs=0  
 $I ds = k \frac{\omega p \omega}{L p \omega} \left[ (\frac{V t d}{\omega}) \right] \longrightarrow (t)$   
I ds for enchancement mode,  $Vgs = VinV$   
 $I ds$  for enchancement mode,  $Vgs = VinV$   
 $I ds = k \frac{\omega p d}{L p d} \left[ (Vinv - Vt)^2 \right] \longrightarrow (2)$   
equating (D and (D,  $O = (D) \ \omega c get$   
 $\frac{k' \omega p \omega}{L p \omega} \left( -Vt d \right)^2 = \frac{k' \omega p d}{L p d} \left( \frac{Vinv - Vt}{2} \right)^2$   
 $\frac{1}{r^2 p \omega} \left( -Vt d \right)^2 = \frac{1}{z p d} (Vinv - Vt)^2$   
 $\frac{1}{r^2 p \omega} \left( -Vt d \right)^2 = \frac{1}{z p d} (Vinv - Vt)^2$ 

 $\frac{2}{2} \frac{P_{U}}{P_{d}} = (0.6 \sqrt{DD})^{2}$   $Vt = 0.2 \sqrt{DD}$ (0.5V00-0.2V00)2 Vinv = 0.5V00 and draw a rate to  $= \left( \frac{0.6 \text{V}_{\text{DD}}}{0.3 \text{V}_{\text{DD}}} \right)^2$ of any of any Zpu = U T stal  $\frac{z_{po}}{z_{pd}} = 4:1$  thous without of the The pullup to pull down ratio of nmos inverter driven by another nmos is 4:1 Pullup to pulldown Jutio of nimos driven by another nmos with one or more pass transistors. The arrangement of pullup to pulldown ratio for nons driven by another nons with one or more pass transistors is depicted as below. VDD VDD VDD VDD VEP VOL VPD \* when the output of invertex 1 is passed through series of pass Ti transistors. then full logic levels are not obtain due to threshold voltage of pass transistors.ie., Vinuz=Voo-Vy.

www.Jntufastupdates.com

www.Jntufastupdates.com

 $\frac{d_{s_1}}{V_{d_{s_1}}} = \frac{k}{Z_{pd_1}} \left( V_{0,D} - V_{L} \right)$  $\frac{1}{P_{I}} = \frac{k}{\epsilon P_{A_{I}}} \left( V_{pp} \cdot V_{t} \right)$  $R_1 = \frac{7 p d_1}{K(v_{bp} - v_t)}$ VO1 = J, R,  $V_{01} = \frac{18}{2} \left(\frac{-V_{td}}{2}\right)^2 \frac{7}{2} \frac{2pd1}{k(V_{DD} - V_{t})}$  $V_{01} = \frac{ZPa1}{ZPU1} \frac{(-Vta)^2}{Q(VDD-Vt)}$ For inverter 2 For saturation mode, Ids. con be seen by  $I_{ds} = \frac{k\omega}{L} \left( \frac{V_{gs} - VL}{L} \right)^2$ For depletion mode, vgs=0 Ids\_ = Kuopuz (-Vtd)2  $\frac{T_{a}}{Z_{p}} = \frac{k}{Z_{p}} \left(-\frac{VLd}{2}\right)^{2}$ Ids in non-saturation can be given by Ids = KLO [[Vgs-VE]Vds - Vds] pairalpa

For enhancement mode, 
$$vgs = Vop - Vtp$$
  

$$Tds_{2} = \frac{k\omega pd_{2}}{LRd_{2}} \left[ (vgs - vt) Vds_{2} - \frac{vds_{2}^{2}}{2} \right]$$

$$Tds_{2} = \frac{k}{Zpd_{2}} \left[ (vgs - vt) - \frac{Vds}{2} \right]$$

$$reglecting$$

$$\frac{1}{P_{2}} = \frac{k}{Zpd_{2}} \left[ vgs - vt \right]$$

$$\frac{1}{P_{2}} = \frac{k}{Zpd_{2}} \left[ vgs - vt \right]$$

$$\frac{1}{R_{2}} = \frac{-k}{Zpd_{2}} \left[ (vop - Vtp - vt) \right]$$

$$\frac{1}{R_{2}} = \frac{-k}{Zpd_{2}} \left[ (vop - Vtp - vt) \right]$$

$$\frac{1}{R_{2}} = \frac{-k}{Zpd_{2}} \left[ (vop - Vtp - vt) \right]$$

$$Vo_{2} = TaR_{2}$$

$$= \frac{kV}{Zpv_{2}} \left( \frac{-Vtd}{2} \right)^{2} \frac{7Pd_{2}}{V(vop - Vtp - vt)}$$

$$\frac{1}{Vo_{2}} = \frac{7Pd_{2}}{Zpv_{2}} \left( \frac{-Vtd}{2} \right)^{2} \frac{V(vop - Vtp - vt)}{V(vop - vtp - vt)}$$

$$\frac{1}{Vo_{1}} = \frac{ZPd_{1}}{Zpv_{1}} \left( \frac{-Vtd}{2} \right)^{2} \frac{V}{V(vop - vtp - vt)}$$

$$\frac{1}{Zpv_{1}} \frac{1}{a} \left( \frac{-Vtd}{2} \right)^{2} \frac{V}{a} \left( \frac{-Vtd}{2} \right)^{2} \frac{V}{a} \left( \frac{-Vtd}{2} \right)^{2}}{Zpv_{1}} \frac{1}{a} \left( \frac{-Vtd}{2} \right)^{2} \frac{Vtp - vt}{a}$$

$$\frac{2p_{H1}}{2p_{V1}(V_{DD}-V_{L})} = \frac{2p_{D2}}{2p_{V2}(V_{DD}-V_{LP}-V_{L})}$$

$$\frac{2p_{D2}}{2p_{V2}} = \frac{2p_{H1}(v_{DD}-V_{L}-V_{L})}{2p_{H2}}$$

$$\frac{2p_{V2}}{2p_{H2}} = \frac{2p_{V1}(v_{DD}-v_{L})}{2p_{H1}(v_{DD}-v_{L}-V_{L})}$$

$$\frac{2p_{V2}}{2p_{H2}} = \frac{4}{1} \times \frac{V_{DD}-0.2V_{DD}}{V_{DD}-0.2V_{DD}}$$

$$V_{LP} = 0.3v_{DD}$$

$$\frac{2p_{V2}}{2p_{H2}} = \frac{4}{1} \times \frac{v_{DD}-0.2V_{DD}}{v_{DD}-0.2V_{DD}}$$

$$= \frac{4}{1} \times \frac{v_{B}}{v_{D}} \frac{v_{PD}}{v_{DD}}$$

$$= \frac{2p_{V2}}{1} \times \frac{s_{B}}{v_{D}}$$

$$= \frac{2p_{V2}}{2p_{V2}} \approx \frac{s_{B}}{1}$$

$$= \frac{2p_{V2}}{2p_{V2}} \approx \frac{s_{B}}{1}$$

$$\therefore \text{ The pullup to pulldown statio of nmos driven by another nmos with one or mose pass transitions by another nmos with one or mose pass transitions is s:1$$

$$= \frac{1}{v_{D}} \frac{v_{D}}{v_{D}} = \frac{1}{v_{D}} \frac{v_{D}}{v_{D}} = \frac{1}{v_{D}} \frac{v_{D}}{v_{D}} + \frac{1}{v_{D}} \frac{v_{D}}{v_{D}} = \frac{1}{v_{D}} \frac{v_{D}}{v_{D}} + \frac{1}{v_{D}} \frac{v_{$$





www.Jntufastupdates.com



www.Jntufastupdates.com

The Transistors T5, T6 will get - furn ON when Thand Tz Despectively being turn off. · Latch up in CHOS -\* Latch of is a inherent problem in CHOS, that provide a low impedance path between vob and Vss. \* latch up may arise due to noise, switch on and OFF or by Incident radiation. \* The latch up mechanism can be better under -stood with the following arrangement. thread signal as the 4 V q vin P pite 0 Vpp 9 Vo VSS Alle Shines 11/2 TID Mi al Y pnt n q pt nt RW Rs 1.11 m P-Sub latch up effecting n-well substrate

\* in the above figure, if sufficient substrate current flows for ap, then ap will turn on fift draws some current through Rs.

\* If it is enough to drive an then it will also turn on.

tence a short circuit path blue Vop and Uss is obtained and it is called as latch up problem.



www.Jntufastupdates.com

\* Oxidation: - Processing steps for fabrication of Ici. Silicon is one of mostly used oxidised moterial and which may also acts as good masking element.
\* To perform oxidation process let us consider a furnace with silicon and rise the temperature.
\* Oxidation is two types 1) Dray orridation: Hese the silicon is steacting with 02 to Form 3i02.
Si + 02 → Si02
2) Lot orridation: Silicon steacts with H20 to form Si02.

Si + 2H<sub>2</sub>O  $\longrightarrow$  SiO<sub>2</sub> + DH<sub>2</sub> We Have O<sub>2</sub>, H<sub>2</sub>O are called Dridonts that are up to Oridise Silicon. \* <u>ion implantation:</u>-

ion implantation process is use to diffuse the dopands into a specified motestial (or) Substrate.

\* Here the dopond is to be diffuse into a substrate material with a sufficient energy.

\* By the strength of the dopand it penterates through the substrate and may cause some effect on lattice atom.

Nuckear stoping: - when the dopand is injected into the Substrate, depending upon the strength the dopand

may change the position of lattice atom and may damage the lattice atom. If the strength is further more increase. Hence it is called Nuclear stoping. electronic \* During ion implantation process, if the dopands

change the position of the lattice atom and it is Shows no damage of latticeator thence it is called electronic stoping.

slight

\* photolithography (or) lithography:-a project Floot is depands, into photolithography is used to diffuse dopands. into Substrate in a selected position through masking element.

i.e., Lithography (or) photolithography is the process of transfering geometrical patterns from mosking element to silicon.

\* In olden days, we don't have photolithography technology then a mask of Suitable pattern is transfer into Silicon using a litho (stone).

\* Metalization -

Metalization is the process that is use to expect terminals from the device. These terminals are used to have contact to the outside world. that is means to measure output (throughput)

Metalization ohmic contacts (AP) Normal silicides (poly + silicon) \* Encapsulation:-Encapsulation is the process that is use after manufacturing of the device. Encopsulation provides protection to the Whole body (01) package. the of st of Whener Manual a tear winny will a st Straite | testation of provide

# 2.1 DRAIN-TO-SOURCE CURRENT Ids versus VOLTAGE Vds RELATIONSHIPS

The whole concept of the MOS transistor evolves from the use of a voltage on the gate to induce a charge in the channel between source and drain, which may then be caused to move from source to drain under the influence of an electric field created by voltage  $V_{ds}$  applied between drain and source. Since the charge induced is dependent on the gate to source voltage  $V_{gs}$ , then  $I_{ds}$  is dependent on both  $V_{gs}$  and  $V_{ds}$ . Consider a structure, as in Figure 2.1, in which electrons will flow source to drain:

$$I_{ds} = -I_{sd} = \frac{\text{Charge induced in channel } (Q_c)}{\text{Electron transit time } (\tau)}$$
(2.1)

First, transit time:





 $v = \mu E_{ds}$ 

but velocity

where

 $\mu$  = electron or hole mobility (surface)  $E_{ds}$  = electric field (drain to source) Now

$$E_{ds} = \frac{V_{ds}}{L}$$

so that

Thus

 $v = \frac{\mu V_{ds}}{L}$ 

 $\tau_{sd} = \frac{L^2}{\mu V_{to}}$ 

(2.2)

Typical values of  $\mu$  at room temperature are:

 $\mu_n = 650 \text{ cm}^2/\text{V} \text{ sec (surface)}$  $\mu_p = 240 \text{ cm}^2/\text{V} \text{ sec (surface)}$ 

# 2.1.1 The Non-saturated Region

Charge induced in channel due to gate voltage is due to the voltage difference between the gate and the channel,  $V_{gs}$  (assuming substrate connected to source). Now note that the voltage along the channel varies linearly with distance X from the source due to the IR drop in the channel (see Figure 1.5) and assuming that the device is not saturated then the average value is  $V_{ds}/2$ . Furthermore, the effective gate voltage  $V_g = V_{gs} - V_t$  where  $V_t$  is the threshold voltage needed to invert the charge under the gate and establish the channel.

Note that the charge/unit area =  $E_g \varepsilon_{ins} \varepsilon_0$ . Thus induced charge

$$Q_c = E_g \varepsilon_{ins} \varepsilon_0 WL$$

where

 $E_g$  = average electric field gate to channel

 $\varepsilon_{ins}$  = relative permittivity of insulation between gate and channel

 $\varepsilon_0$  = permittivity of free space

20

(*Note:*  $\epsilon_0 = 8.85 \times 10^{-14} \text{F cm}^{-1}$ ;  $\epsilon_{ins} \neq 4.0$  for silicon dioxide)

Now

$$E_g = \frac{\left( (V_{gs} - V_l) - \frac{V_{ds}}{2} \right)}{D}$$

where D = oxide thickness.

Thus

$$Q_{c} = \frac{WL\varepsilon_{ins}\varepsilon_{0}}{D} \left( (V_{gs} - V_{t}) - \frac{V_{ds}}{2} \right)$$
(2.3)

Now, combining equations (2.2) and (2.3) in equation (2.1), we have

$$I_{ds} = \frac{\varepsilon_{ins}\varepsilon_{0}\mu}{D}\frac{W}{L}\left((V_{gs} - V_{t}) - \frac{V_{ds}}{2}\right)V_{ds}$$

or

$$I_{ds} = K \frac{W}{L} \left( (V_{gs} - V_{l}) V_{ds} - \frac{V_{ds}^{2}}{2} \right)$$

in the non-saturated or resistive region where  $V_{ds} < V_{gs} - V_l$  and

 $K = \frac{\varepsilon_{ins}\varepsilon_0\mu}{D}$ 

The factor W/L is, of course, contributed by the geometry and it is common practice to write

$$\beta = K \frac{W}{L}$$

so that

$$I_{ds} = \beta \left( (V_{gs} - V_t) V_{ds} - \frac{V_{ds}^2}{2} \right)$$
(2.4a)

(2.4)

which is an alternative form of equation (2.4). Noting that gate/channel capacitance

$$C_g = \frac{\varepsilon_{lns}\varepsilon_0 WL}{D}$$
 (parallel plate)

we also have

$$K = \frac{C_g \mu}{WL}$$

so that

3

$$I_{ds} = \frac{C_g \mu}{L^2} \left( (V_{gs} - V_i) V_{ds} - \frac{V_{ds}^2}{2} \right)$$
(2.4b)

which is a further alternative form of equation (2.4).

Sometimes it is convenient to use gate capacitance per unit area  $C_0$  (which is often denoted  $C_{ox}$ ) rather than  $C_g$  in this and other expressions. Noting that

$$C_g = C_0 WL$$

we may also write

$$I_{ds} = C_0 \mu \frac{W}{L} \left( (V_{gs} - V_t) V_{ds} - \frac{V_{ds}^2}{2} \right)$$
(2.4c)

## 2.1.2 The Saturated Region

Saturation begins when  $V_{ds} = V_{gs} - V$ , since at this point the *IR* drop in the channel equals the effective gate to channel voltage at the drain and we may assume that the current remains fairly constant as  $V_{ds}$  increases further. Thus

$$I_{ds} = K \frac{W}{L} \frac{(V_{gs} - V_t)^2}{2}$$
(2.5)

or, we may write

$$I_{ds} = \frac{\beta}{2} \left( V_{gs} - V_t \right)^2$$
 (2.5a)

OL

$$I_{ds} = \frac{C_g \mu}{2L^2} \left( V_{gs} - V_t \right)^2$$
(2.5b)

We may also write

$$I_{ds} = C_0 \mu \frac{W}{2L} \left( V_{gs} - V_t \right)^2 \quad \text{minor electric product of the }$$
(2.5c)

The expressions derived for  $I_{ds}$  hold for both enhancement and depletion mode devices, but it should be noted that the threshold voltage for the nMOS depletion mode device (denoted as  $V_{td}$ ) is negative.

Typical characteristics for nMOS transistors are given in Figure 2.2. pMOS transistor characteristics are similar, with suitable reversal of polarity.

# 2.2 ASPECTS OF MOS TRANSISTOR THRESHOLD VOLTAGE $V_t$

The gate structure of a MOS transistor consists, electrically, of charges stored in the dielectric layers and in the surface to surface interfaces as well as in the substrate itself.

Switching an enhancement mode MOS transistor from the off to the on state consists in applying sufficient gate voltage to neutralize these charges and enable the underlying silicon to undergo an inversion due to the electric field from the gate.

Switching a depletion mode nMOS transistor from the on to the off state consists in applying enough voltage to the gate to add to the stored charge and invert the 'n' implant region to 'p'.

The threshold voltage  $V_t$  may be expressed as:

$$V_{l} = \phi_{ms} \frac{Q_{B} - Q_{SS}}{C_{0}} + 2\phi_{fN}$$
(2.6)

where

 $Q_B$  = the charge per unit area in the depletion layer beneath the oxide  $Q_{SS}$  = charge density at Si:SiO<sub>2</sub> interface





Scanned by CamScanner

 $C_0$  = capacitance per unit gate area

 $\phi_{ms}$  = work function difference between gate and Si

 $\phi_{IN}$  = Fermi level potential between inverted surface and bulk Si.

Now, for polysilicon gate and silicon substrate, the value of  $\phi_{ms}$  is negative but negligible, and the magnitude and sign of  $V_t$  are thus determined by the balance between the remaining  $-Q_{ss}$ 

negative term  $\frac{-Q_{SS}}{C_0}$  and the other two terms, both of which are positive. To evaluate  $V_t$ , each term is determined as follows:

$$Q_B = \sqrt{2\varepsilon_0 \varepsilon_{Si} q N (2\phi_{fN} + V_{SB})} \text{ coulomb/m}^2$$
  
$$\phi_{fN} = \frac{kT}{q} \ln \frac{N}{n_i} \text{ volts}$$

 $Q_{SS} = (1.5 \text{ to } 8) \times 10^{-8} \text{ coulomb/m}^2$ 

depending on crystal orientation, and where

 $V_{SB}$  = substrate bias voltage (negative w.r.t. source for nMOS, positive for pMOS)  $q = 1.6 \times 10^{-19}$  coulomb

N = impurity concentration in the substrate ( $N_A$  or  $N_D$  as appropriate)

- $\varepsilon_{si}$  = relative permittivity of silicon  $\Rightarrow$  11.7
- $n_i$  = intrinsic electron concentration (1.6 × 10<sup>10</sup>/cm<sup>3</sup> at 300°K)

k = Boltzmann's constant =  $1.4 \times 10^{-23}$  joule/°K

The *body effects* may also be taken into account since the substrate may be biased with respect to the source, as shown in Figure 2.3.



FIGURE 2.3 Body effect (nMOS device shown).

Increasing  $V_{SB}$  causes the channel to be depleted of charge carriers and thus the threshold voltage is raised.

Change in  $V_t$  is given by  $\Delta V_t \neq \gamma (V_{SB})^{1/2}$  where  $\gamma$  is a constant which depends on substrate doping so that the more lightly doped the substrate, the smaller will be the body effect.

Alternatively, we may write

$$V_{t} = V_{t}(0) + \left(\frac{D}{\varepsilon_{ins}\varepsilon_{0}}\right) \sqrt{2\varepsilon_{0}\varepsilon_{Si}QN} \cdot (V_{SB})^{1/2}$$

where  $V_t(0)$  is the threshold voltage for  $V_{SB} = 0$ . To establish the magnitude of such effects, typical figures for  $V_t$  are as follows:

For nMOS enhancement mode transistors:

$$V_{SB} = 0 \text{ V}; V_{t} = 0.2V_{DD} (= +1 \text{ V for } V_{DD} = +5 \text{ V}) \begin{cases} \text{Similar but} \\ \text{negative values} \end{cases}$$

For nMOS depletion mode transistors:

$$V_{SB} = 0 \text{ V}; V_{td} = -0.7V_{DD} (= -3.5 \text{ V for } V_{DD} = +5 \text{ V})$$
  
 $V_{SB} = 5 \text{ V}; V_{td} = -0.6V_{DD} (= -3.0 \text{ V for } V_{DD} = +5 \text{ V})$ 

#### MOS TRANSISTOR TRANSCONDUCTANCE $g_m$ AND OUTPUT 2.3 CONDUCTANCE $g_{ds}$

Transconductance expresses the relationship between output current  $I_{ds}$  and the input voltage  $V_{gs}$  and is defined as

$$g_m = \frac{\delta I_{ds}}{\delta V_{gs}} | V_{ds} = \text{constant}$$

To find an expression for  $g_m$  in terms of circuit and transistor parameters, consider that the charge in channel  $Q_c$  is such that

$$\frac{Q_c}{I_{ds}} = \tau$$

where  $\tau$  is transit time. Thus change in current

$$\delta I_{ds} = \frac{\delta Q_c}{\tau_{ds}}$$

Now

$$\tau_{ds} = \frac{L^2}{\mu V_{ds}}$$

(from 2.2)

Thus

$$\delta I_{ds} = \frac{\delta Q_c V_{ds} \mu}{L^2}$$

www.Jntufastupdates.com

Scanned by CamScanner

(2.7)

-3:5V.10 -3.010

but change in charge

so that

140\*\*

$$g_m = \frac{\delta I_{ds}}{\delta V_{gs}} = \frac{C_g \mu V_{ds}}{L^2}$$

 $V_{ds} = V_{gs} - V_t$  as sevel more seven

 $g_m = \frac{C_g \mu}{I^2} \left( V_{gs} - V_t \right)$ 

 $\delta I_{ds} = \frac{C_g \delta V_{gs} \mu V_{ds}}{I^2} O^{-6}$ 

In saturation

Compare this with the typical bulk mobilities

and substituting for 
$$C_g = \frac{\varepsilon_{ins}\varepsilon_0 WL}{D}$$

()

$$g_m = \frac{\mu \varepsilon_{ins} \varepsilon_0}{D} \frac{W}{L} (V_{gs} - V_t)$$
(2.7a)

Alternatively,

 $g_m = \beta (V_{gs} - V_l)$ 

It is possible to increase the  $g_m$  of a MOS device by increasing its width. However, this will also increase the input capacitance and area occupied.

A reduction in the channel length results in an increase in  $\omega_0$  owing to the higher  $g_m$ . However, the gain of the MOS device decreases owing to the strong degradation of the output resistance =  $1/g_{ds}$ .

The output conductance  $g_{ds}$  can be expressed by

$$g_{ds} = \frac{\delta I_{ds}}{\delta V_{gs}} = \lambda . I_{ds} \alpha \left(\frac{1}{L}\right)^2$$

Here the strong dependence on the channel length is demonstrated as

$$\lambda \alpha \left(\frac{1}{L}\right)$$
 and  $I_{ds} \alpha \left(\frac{1}{L}\right)$ 

for the MOS device.

www.Jntufastupdates.com

Scanned by CamScanner

# 2.4 MOS TRANSISTOR FIGURE OF MERIT $\omega_0$

An indication of frequency response may be obtained from the parameter  $\omega_0$  where

$$\omega_0 = \frac{g_m}{C_g} = \frac{\mu}{L^2} \left( V_{gs} - V_t \right) \left( = \frac{1}{\tau_{sd}} \right)$$
(2.8)

This shows that switching speed depends on gate voltage above threshold and on carrier mobility and inversely as the square of channel length. A fast circuit requires that  $g_m$  be as high as possible.

Electron mobility on a (100) oriented n-type inversion layer surface  $(\mu_n)$  is larger than that on a (111) oriented surface, and is in fact about three times as large as hole mobility on a (111) oriented p-type inversion layer. Surface mobility is also dependent on the effective gate voltage  $(V_{gs} - V_l)$ .

For faster nMOS circuits, then, one would choose a (100) oriented p-type substrate in which the inversion layer will have a surface carrier mobility  $\mu_n \neq 650 \text{ cm}^2/\text{V}$  sec at room temperature.

Compare this with the typical bulk mobilities

$$\mu_n = 1250 \text{ cm}^2/\text{V} \text{ sec}$$
  
 $\mu_n = 480 \text{ cm}^2/\text{V} \text{ sec}$ 

from which it will be seen that  $\frac{\mu_s}{\mu} = 0.5$  (where  $\mu_s = \text{surface mobility and } \mu = \text{bulk mobility}$ ).

## 2.5 THE PASS TRANSISTOR

Unlike bipolar transistors, the isolated nature of the gate allows MOS transistors to be used as switches in series with lines carrying logic levels in a way that is similar to the use of relay contacts. This application of the MOS device is called the *pass transistor* and switching logic arrays can be formed—for example, an *And* array as in Figure 2.4.



Note: Means must exist so that X assumes ground potential when  $A + B + C = Q_{-}$ FIGURE 2.4 Pass transistor And gate.

9

# 2.6 THE nMOS INVERTER

A basic requirement for producing a complete range of logic circuits is the inverter. This is needed for restoring logic levels, for *Nand* and *Nor* gates, and for sequential and memory circuits of various forms. In the treatment of the inverter used in this section, the authors wish to acknowledge the influence of material previously published by Mead and Conway.

The basic inverter circuit requires a transistor with source connected to ground and a load resistor of some sort connected from the drain to the positive supply rail  $V_{DD}$ . The output is taken from the drain and the input applied between gate and ground.

Resistors are not conveniently produced on the silicon substrate; even modest values occupy excessively large areas so that some other form of load resistance is required. A convenient way to solve this problem is to use a depletion mode transistor as the load, as shown in Figure 2.5.



FIGURE 2.5 nMOS inverter.

Now:

- With no current drawn from the output, the currents  $I_{ds}$  for both transistors must be equal.
- For the depletion mode transistor, the gate is connected to the source so it is always on and only the characteristic curve  $V_{gs} = 0$  is relevant.
- In this configuration the depletion mode device is called the pull-up (p.u.) and the enhancement mode device the pull-down (p.d.) transistor.
- To obtain the inverter transfer characteristic we superimpose the  $V_{gs} = 0$  depletion mode characteristic curve on the family of curves for the enhancement mode device, noting that maximum voltage across the enhancement mode device corresponds to minimum voltage across the depletion mode transistor.
- The points of intersection of the curves as in Figure 2.6 give points on the transfer characteristic, which is of the form shown in Figure 2.7.
- Note that as  $V_{in}(=V_{gs}$  p.d. transistor) exceeds the p.d. threshold voltage current begins to flow. The output voltage  $V_{out}$  thus decreases and the subsequent increases in  $V_{in}$  will cause the p.d. transistor to come out of saturation and become resistive. Note that the p.u. transistor is initially resistive as the p.d. turns on.



 $V_{ds}(enh) = V_{DD} - V_{ds}(dep) = V_{out}$  $V_{gs}(enh) = V_{in} \dots$  intersection points give transfer characteristic





nMOS inverter transfer characteristic.

• During transition, the slope of the transfer characteristic determines the gain:

$$Gain = \frac{\delta V_{out}}{\delta V_{in}}$$

www.Jntufastupdates.com

• The point at which  $V_{out} = V_{in}$  is denoted as  $V_{inv}$  and it will be noted that the transfer characteristics and  $V_{inv}$  can be shifted by variation of the ratio of pull-up to pull-down resistances (denoted  $Z_{p.u.}/Z_{p.d.}$  where Z is determined by the length to width ratio of the transistor in question).



# DETERMINATION OF PULL-UP TO PULL-DOWN RATIO $(Z_{p.u}/Z_{p.d.})$ FOR AN nMOS INVERTER DRIVEN BY ANOTHER nMOS INVERTER

Consider the arrangement in Figure 2.8 in which an inverter is driven from the output of another similar inverter. Consider the depletion mode transistor for which  $V_{gs} = 0$  under all conditions, and further assume that in order to cascade inverters without degradation of levels we are aiming to meet the requirement



FIGURE 2.8 nMOS inverter driven directly by another inverter.

For equal margins around the inverter threshold, we set  $V_{inv} = 0.5V_{DD}$ . At this point both transistors are in saturation and

$$I_{ds} = K \frac{W}{L} \frac{\left(V_{gs} - V_{l}\right)^{2}}{2}$$

In the depletion mode

$$I_{ds} = K \frac{W_{p.u.}}{L_{p.u.}} \frac{(-V_{td})^2}{2} \text{ since } V_{gs} = 0$$

and in the enhancement mode

$$I_{ds} = K \frac{W_{p.d.}}{L_{p.d.}} \frac{(V_{inv} - V_t)^2}{2} \text{ since } V_{gs} = V_{inv}$$

Equating (since currents are the same) we have

$$\frac{W_{p.d.}}{L_{p.d.}} \left( V_{inv} - V_{t} \right)^{2} = \frac{W_{p.u.}}{L_{p.u.}} \left( -V_{td} \right)^{2}$$

D

where  $W_{p.d.}$ ,  $L_{p.d.}$ ,  $W_{p.u.}$ , and  $L_{p.u.}$  are the widths and lengths of the pull-down and pull-up transistors respectively.

Now write

$$Z_{p.d.} = \frac{L_{p.d.}}{W_{p.d.}}; Z_{p.u.} = \frac{L_{p.u.}}{W_{p.u.}}$$

we have

$$\frac{1}{Z_{p.d.}} (V_{inv} - V_t)^2 = \frac{1}{Z_{p.u.}} (-V_{td})^2$$

whence

$$V_{inv} = V_t - \frac{V_{td}}{\sqrt{Z_{p.u.}/Z_{p.d.}}}$$
(2.9)

Now we can substitute typical values as follows:

 $V_t = 0.2 V_{DD}; V_{td} = -0.6 V_{DD}$  $V_{inv} = 0.5 V_{DD}$  (for equal margins)

thus, from equation (2.9)

$$0.5 = 0.2 + \frac{0.6}{\sqrt{Z_{p,u}/Z_{p,d}}}$$

whence

$$\sqrt{Z_{p.u.}/Z_{p.d.}} = 2$$

and thus

$$Z_{p.u.}/Z_{p.d.} = 4/1$$

for an inverter directly driven by an inverter.

# 2.8 PULL-UP TO PULL-DOWN RATIO FOR AN MMOS INVERTER DRIVEN THROUGH ONE OR MORE PASS TRANSISTORS

Now consider the arrangement of Figure 2.9 in which the input to inverter 2 comes from the output of inverter 1 but passes through one or more nMOS transistors used as switches in series (called *pass transistors*).

We are concerned that connection of pass transistors in series will degrade the logic 1 level (into inverter 2 so that the output will not be a proper logic 0 level. The critical condition is when point A is at 0 volts and B is thus at  $V_{DD}$ , but the voltage into inverter 2 at point C is now reduced from  $V_{DD}$  by the threshold voltage of the series pass transistor. With all pass transistor gates connected to  $V_{DD}$  (as shown in Figure 2.8), there is a loss of

38

\_





FIGURE 2.9 Pull-up to pull-down ratios for inverting logic coupled by pass transistors.

 $V_{tp}$ , however many are connected in series, since no static current flows through them and there can be no voltage drop in the channels. Therefore, the input voltage to inverter 2 is

$$V_{in2} = V_{DD} - V_{tp}$$

where

 $V_{tp}$  = threshold voltage for a pass transistor.

We must now ensure that for this input voltage we get out the same voltage as would be the case for inverter 1 driven with input =  $V_{DD}$ .

Consider inverter 1 (Figure 2.10(a)) with input =  $V_{DD}$ . If the input is at  $V_{DD}$ , then the p.d. transistor  $T_2$  is conducting but with a low voltage across it; therefore, it is in its resistive region represented by  $R_1$  in Figure 2.10. Meanwhile, the p.u. transistor  $T_1$  is in saturation and is represented as a current source.



FIGURE 2.10 Equivalent circuits of inverters 1 and 2.

For the p.d. transistor

$$I_{ds} = K \frac{W_{p.d.1}}{L_{p.d.1}} \left( (V_{DD} - V_t) V_{ds1} - \frac{V_{ds1}^2}{2} \right)$$
(from 2.4)

of 18,2 i

Therefore

$$R_{1} = \frac{V_{ds1}}{I_{ds}} = \frac{1}{K} \frac{L_{p.d.1}}{W_{p.d.1}} \left( \frac{1}{V_{DD} - V_{t} - \frac{V_{ds1}}{2}} \right)$$

Note that  $V_{ds1}$  is small and  $V_{ds1}/2$  may be ignored. Thus

$$R_1 \doteq \frac{1}{K} Z_{p.d.1} \left( \frac{1}{V_{DD} - V_t} \right)$$

Now, for depletion mode p.u. in saturation with  $V_{gs} = 0$ 

$$I_1 = I_{ds} = K \frac{W_{p.u.1}}{L_{p.u.1}} \frac{(-V_{td})^2}{2}$$
 (from 2.5)

The product

$$I_1 R_1 = V_{out \ 1}$$

Thus

$$V_{out1} = I_1 R_1 = \frac{Z_{p.d.1}}{Z_{p.u.1}} \left(\frac{1}{V_{DD} - V_t}\right) \frac{(V_{td})^2}{2}$$

Consider inverter 2 (Figure 2.10(b)) when input =  $V_{DD} - V_{tp}$ . As for inverter 1

$$R_{2} \neq \frac{1}{K} Z_{p.d.2} \frac{1}{((V_{DD} - V_{tp}) - V_{t})}$$
$$I_{2} = K \frac{1}{Z_{p.u.2}} \frac{(-V_{td})^{2}}{2}$$

whence

$$V_{out\,2} = I_2 R_2 = \frac{Z_{p.d.2}}{Z_{p.u.2}} \left( \frac{1}{V_{DD} - V_{tp} - V_t} \right) \frac{(-V_{td})^2}{2}$$

If inverter 2 is to have the same output voltage under these conditions then  $V_{out 1} = V_{out 2}$ . That is

$$I_1 R_1 = I_2 R_2$$

Therefore

$$\frac{Z_{p.u.2}}{Z_{p.d.2}} = \frac{Z_{p.u.1}}{Z_{p.d.1}} \frac{(V_{DD} - V_t)}{(V_{DD} - V_{tp} - V_t)}$$

www.Jntufastupdates.com

#### Scanned by CamScanner

# Taking typical values

 $V_t = 0.2V_{DD}$   $V_{tp} = 0.3V_{DD}^*$   $Z_{THZ} = Z_{THZ} = 0.8$ 

$$\frac{Z_{p.u.2}}{Z_{p.d.2}} = \frac{Z_{p.u.1}}{Z_{p.d.1}} \frac{0.8}{0.5}$$

Therefore

$$\frac{Z_{p.u.2}}{Z_{p.d.2}} \div 2 \ \frac{Z_{p.u.1}}{Z_{p.d.1}} = \frac{8}{1}$$

Summarizing for an nMOS inverter:

- An inverter driven directly from the output of another should have a  $Z_{p.u}/Z_{p.d.}$  ratio of  $\geq 4/1$ .
- An inverter driven through one or more pass transistors should have a  $Z_{p.u.}/Z_{p.d.}$  ratio of  $\geq 8/1$ .

Note: It is the driven, not the driver, whose ratio is affected/

# 2.9 ALTERNATIVE FORMS OF PULL-UP detailed sold

Up to now we have assumed that the inverter circuit has a depletion mode pull-up transistor as its load. There are, however, at least four possible arrangements:

1. Load resistance  $R_L$  (Figure 2.11). This arrangement is not often used because of the large space requirements of resistors produced in a silicon substrate.



www.Jntufastupdates.com

Scanned by CamScanner

# Basic VLSI Design

- 2. nMOS depletion mode transistor pull-up (Figure 2.12).
  - (a) Dissipation is high since rail to rail current flows when  $V_{in} =$ logical 1.
  - (b) Switching of output from 1 to 0 begins when  $V_{in}$  exceeds  $V_i$  of p.d. device.
  - (c) When switching the output from 1 to 0, the p.u. device is non-saturated initially and this presents lower resistance through which to charge capacitive loads.



FIGURE 2.12 nMOS depletion mode transistor pull-up and transfer characteristic.

- 3. nMOS enhancement mode pull-up (Figure 2.13).
  - (a) Dissipation is high since current flows when  $V_{in} = \text{logical 1} (V_{GG} \text{ is returned to } V_{DD})$ .
  - (b)  $V_{out}$  can never reach  $V_{DD}$  (logical I) if  $V_{GG} = V_{DD}$  as is normally the case.



Scanned by CamScanner

17

- (c)  $V_{GG}$  may be derived from a switching source, for example, one phase of a clock, so that dissipation can be greatly reduced.
- (d) If  $V_{GG}$  is higher than  $V_{DD}$  then an extra supply rail is required.

4. Complementary transistor pull-up (CMOS) (Figure 2.14).

- (a) No current flow either for logical 0 or for logical 1 inputs.
- (b) Full logical 1 and 0 levels are presented at the output.
- (c) For devices of similar dimensions the p-channel is slower than the n-channel device.



www.Jntufastupdates.com Scar

#### 2.10 THE CMOS INVERTER

The general arrangement and characteristics are illustrated in Figure 2.14. We have seen (equations 2.4 and 2.5) that the current/voltage relationships for the MOS transistor may be written

$$I_{ds} = K \frac{W}{L} (V_{gs} - V_t) V_{ds} - \frac{V_{ds}^2}{2}$$

in the resistive region, or

$$I_{ds} = K \frac{W}{L} \frac{(V_{gs} - V_{t})^{2}}{2}$$

in saturation. In both cases the factor K is a technology-dependent parameter such that

$$K = \frac{\varepsilon_{ins}\varepsilon_0\mu}{D}$$

The factor W/L is, of course, contributed by the geometry and it is common practice to write

$$3 = K \frac{W}{L}$$

so that, for example

$$I_{ds} = \frac{\beta}{2} (V_{gs} - V_t)^2$$

in saturation, and where  $\beta$  may be applied to both nMOS and pMOS transistors as follows:

$$\beta_n = \frac{\varepsilon_{ins}\varepsilon_0\mu_n}{D} \frac{W_n}{L_n}$$
$$\beta_p = \frac{\varepsilon_{ins}\varepsilon_0\mu_p}{D} \frac{W_p}{L_p}$$

where  $W_n$  and  $L_n$ ,  $W_p$  and  $L_p$  are the n- and p-transistor dimensions respectively. With regard to Figures 2.14(b) and 2.14(c), it may be seen that the CMOS inverter has five distinct

Considering the static conditions first, it may be seen that in region 1 for which  $V_{in} = \frac{1}{C}$ logic 0, we have the p-transistor fully turned on while the n-transistor is fully turned off. Thus no current flows through the inverter and the output is directly connected to  $V_{DD}$ through the p-transistor. A good logic 1 output voltage is thus present at the output.

In region 5  $V_{in}$  = logic 1, the n-transistor is fully on while the p-transistor is fully off. Again, no current flows and a good logic 0 appears at the output.

#### Scanned by CamScanner

In region 2 the input voltage has increased to a level which just exceeds the threshold voltage of the n-transistor. The n-transistor conducts and has a large voltage between source and drain; so it is in saturation. The p-transistor is also conducting but with only a small voltage across it, it operates in the unsaturated resistive region. A small current now flows through the inverter from  $V_{DD}$  to  $V_{SS}$ . If we wish to analyze the behavior in this region, we equate the p-device resistive region current with the n-device saturation current and thus obtain the voltage and current relationships.

Region 4 is similar to region 2 but with the roles of the p- and n-transistors reversed. However, the current magnitudes in regions 2 and 4 are small and most of the energy consumed in switching from one state to the other is due to the larger current which flows in region 3.

Region 3 is the region in which the inverter exhibits gain and in which both transistors are in saturation.

The currents (with regard to Figure 2.14(c)) in each device must be the same since the transistors are in series, so we may write one of a sufficient series of the same since the same si

of symmetry when  $V_{out} = V_{in} = 0.5V_{out}$  and  $I_{out} = V_{in} = 0.5V_{out}$  and  $I_{out} = V_{in} = 0.5V_{out}$ . The  $\beta$  ratio is often unimportant in many configurations and in most cases minimum size transistor geometries are used for both n- and p-devices. Figure 2.15 indicates the production in the transfer characteristic as the ratio is varied. The changes indicated in the figure would be for quite large variations in  $\frac{1}{2}(\frac{1}{q}V_{1c}-(\frac{1}{Q}V_{1c}-\frac{1}{q}V_{1c}))(\frac{1}{2}) = \frac{1}{qsb}I_{1b}$  ratio is thus not too critical in this respect.

and

$$I_{dsn} = \frac{\beta_n}{2} \left( V_{in} - V_{in} \right)^2$$

from whence we can express  $V_{in}$  in terms of the  $\beta$  ratio and the other circuit voltages and currents

$$V_{in} = \frac{V_{DD} + V_{ip} + V_{in} (\beta_n / \beta_p)^{1/2}}{1 + (\beta_n / \beta_p)^{1/2}}$$
(2.10)

1 wov

Since both transistors are in saturation, they act as current sources so that the equivalent circuit in this region is two current sources in series between  $V_{DD}$  and  $V_{SS}$  with the output voltage coming from their common point. The region is inherently unstable in consequence and the changeover from one logic level to the other is rapid.

If  $\beta_n = \beta_p$  and if  $V_{in} = -V_{ip}$ , then from equation (2.10).

$$V_{in} = 0.5 V_{DL}$$

This implies that the changeover between logic levels is symmetrically disposed about the point at which

$$V_{in} = V_{out} = 0.5 V_{DD}$$

The MOS transistin can be modeled with varying degrees of complexity However a constituentian of the actual physical construction of the device (as in Figure 1.16) leads to some model.

#### Scanned by CamScanner



where  $I_{ds}/A$  and  $I_c/A$  are current/area and  $R_B$  is base resistance and  $\tau_B$  is the base transit time (usually in the order of 10-30 ps).

Evaluating, we may see that I/A for bipolar is five times better than that for CMOS. A discussion of the current drive aspects of BiCMOS circuits will be found in Chapter 4 (section 4.8.3).

#### 2.12.3 BiCMOS Inverters

As in nMOS and CMOS logic circuitry, the basic logic element is the inverter circuit.

When designing with BiCMOS in mind, the logical approach is to use MOS switches to perform the logic function and bipolar transistors to drive the output loads. The simplest logic function is that of inversion, and a simple BiCMOS inverter circuit is readily set out as shown in Figure 2.17.

It consists of two bipolar transistors  $T_1$  and  $T_2$  with one nMOS transistor  $T_3$ , and one pMOS transistor  $T_4$ , both being enhancement mode devices. The action of the circuit is straightforward and may be described as follows:

- With  $V_{in} = 0$  volts (GND)  $T_3$  is off so that  $T_1$  will be non-conducting. But  $T_4$  is on and supplies current to the base of  $T_2$  which will conduct and act as a current source to charge the load  $C_L$  toward +5 volts( $V_{DD}$ ). The output of the inverter will rise to +5 volts less the base to emitter voltage  $V_{BE}$  of  $T_2$ .
  - With  $V_{in} = +5$  volts  $(V_{DD})$   $T_4$  is off so that  $T_2$  will be non-conducting. But  $T_3$  will now be on and will supply current to the base of  $T_1$  which will conduct and act as a current sink to the load  $C_L$  discharging it toward 0 volts (GND). The output of the inverter will fall to 0 volts plus the saturation voltage  $V_{CEsat}$  from the collector to the emitter of  $T_1$ .

•  $T_1$  and  $T_2$  will present low impedances when turned on into saturation and the load  $C_L$  will be charged or discharged rapidly.



(asually in the order of 10-30 ps).

Fvaluating, we may **retrieve SOMOie Bigmis A iv71.2 BRUDIF** than that for CMOS A discussion of the current drive aspects of BiCMOS circuits will be found in Chapter 4

- The output logic levels will be good and will be close to the rail voltages since  $V_{CEsat}$  is quite small and  $V_{BE}$  is approximately + 0.7 volts.
- The inverter has a high input impedance. another a 20MOIE E.SI.S
- The inverter has a low output impedance.
- The inverter has a high current drive capability but occupies a relatively small area.
- The inverter has high noise margins. Juint of 20MOIA they gaugeed notw

However, owing to the presence of a DC path from  $V_{DD}$  to GND through  $T_3$  and  $T_1$ , this is not a good arrangement to implement since there will be a significant static current flow whenever  $V_{in} = \log i c 1$ . There is also a problem in that there is no discharge path for current from the base of either bipolar transistor when it is being turned off. This will slow down the action of this circuit.

An improved version of this circuit is given in Figure 2.18, in which the DC path through  $T_3$  and  $T_1$  is eliminated, but the output voltage swing is now reduced, since the output cannot fall below the base to emitter voltage  $V_{BE}$  of  $T_1$ .

An improved inverter arrangement, using resistors, is shown in Figure 2.19. In this circuit resistors provide the improved swing of output voltage when each bipolar transistor is off, and also provide discharge paths for base current during turn-off.

The provision of on chip resistors of suitable value is not always convenient and may be space-consuming, so that other arrangements—such as in Figure 2.20—are used. In this circuit, the transistors  $T_5$  and  $T_6$  are arranged to turn on when  $T_2$  and  $T_1$  respectively are being turned off.

In general, BiCMOS inverters offer many advantages where high load current sinking and sourcing is required. The arrangements lead on to the BiCMOS gate circuits which will be dealt with in Chapter 5.

22

Basic Electrical Properties of MOS and BiCMOS Circuits



FIGURE 2.18 An alternative BiCMOS inverter with no static current flow. 39001

 $\mathcal{V}_{DD}$  and  $\mathcal{V}_{SS}$  with dis**onvertees to actual control during fubrication is necessary to avoid** this problem

Latch-up may be induced by gliches on the supply a bot by incident radiation. The supply a bot by incident radiation. The supply a bot of the supply a bot by the supp





## 2.13 LATCH-UP IN CMOS CIRCUITS TOTAL and I and I

A problem which is inherent in the p-well and n-well processes is due to the relatively large number of junctions which are formed in these structures and, as mentioned earlier, the consequent presence of parasitic transistors and diodes. Latch-up is a condition in which the parasitic components give rise to the establishment of low-resistance conducting paths between

Figure 2.23

51

Scanned by CamScanner

ſ

p

1



FIGURE 2.20 An improved BiCMOS inverter using MOS transistors for base current dischange.

 $V_{DD}$  and  $V_{SS}$  with disastrous results. Careful control during fabrication is necessary to avoid this problem.

Latch-up may be induced by glitches on the supply rails or by incident radiation. The mechanism involved may be understood by referring to Figure 2.21, which shows the key parasitic components associated with a p-well structure in which an inverter circuit (for example) has been formed.



FIGURE 2.21 Latch-up effect in p-well structure.

There are, in effect, two transistors and two resistances (associated with the p-well and with regions of the substrate) which form a path between  $V_{DD}$  and  $V_{SS}$ . If sufficient substrate current flows to generate enough voltage across  $R_s$  to turn on transistor  $T_1$ , this will then draw a self-sustaining low-resistance path between the supply rails. If the current gains of the two frames are such that  $\beta_1 \times \beta_2 > 1$ , latch-up may occur. Equivalent circuits are given in Figure 2.22.

UNIT-I 31/12/18 Basic Circuit Concepts 11 - 102 Sheet Resistance (Rs)1-The concept of sheet Resistance Can be understood by Considering a uniform slab material as shown in the figure below. shall land I per square is independent of 0000 dependent en thickness Longut alt lo so Jud av Peelstance Ba nipoladost inos ar like helow sheet Resistance Model \* so from the given diagram we can say that it is having a length of 12 and width of 'w' and with a 0.03 thickness of t. 0.0y \* The Resistance blw the terminals A and B Can be (notive) given as RAB. µ<-℃  $R_{AB} = \frac{P_{A}}{A}$ cosilisping 15-3100 where P- resistivity 084-31 l-length of the material strizment a Parks (manod) A - Cross-sectional Area rol 212001 -\* for a uniform slab L=w "orxes Chapter WRT RAB = PL/A

1

 $R_{AB} = \frac{PR}{2}$ RAB = Pl Hannis times foul.  $= \frac{PA}{\omega E} (\text{uniform slob } l=\omega)$  $R_{AB} = \frac{P}{E}$ The sheet Resistance Rs. can be given as Rs = f \_ per square Note: - The sheet Resistance Rs is independent of Cross-Sectional Area 'A' and dependent on thickness \* some of the typical values of sheet Resistance, different technologies can be tabulated as like be sheet Resistance Hilayerus 101 zum (orbit) 1.2.um 5um 10.0yntoi Metal 0.04 0.03 Diffusion cloament sit with south 20->45 iven at RAG. TRAG F.C. 10->50 (Active) silicid 2->4 Side ptimerizare 15->30 15->100 polysilicon n-transistor poirs 19643 11- 70 221041 - 1 2×104 channel A - Cross-sectional Ar a 2.5×10 Gues 4.5×10 4.5×10 4.5×10 P-transistor channel V/3 1 = C His)

Wheet Resistance Concept Applied to Mois transistors and the sheet Resistance concept can be extended to Here let us Consider a cmos invester as shown invertexs. below." \* The channel Resistance of OF DI 27:27 pmos can be given as ovo 1 Vin-27:27 Rpus = ZRs WKT  $z = \frac{L}{2}$ VSS 112 Rs values Can be =) 27 =1 taken for 5um technology Rpus = 1. Rs 1. Por Convience. = (2.5×104) Lon hRpus = 35 KS210 sandting 1000000. mil \* The Channel Diesistance of Nimos' can be given Par - - - Par as Rpds = Z.Rs  $1 = \frac{1}{16} = \frac{1}{16}$ z = L1.1 = 1.1  $= \frac{27}{27} = 1$ (1'alli = 0.401 -Res = 1. Rs : introval zomen ju somethingt langed no add in Rpds = 10KQ 2617 + 2017 + 2017 + 2017 + 2017 1014.0.401 ... The ON channel Resistance of cmos invester is U HOR RON = Rpus + Rpds = 25k2+10K2 = 35K2

Sheel Resistance Concept Applied for most inverted  
The amost inverter is diagramically as shown  
below.  
The channel Resistance of  
depletion model. Can be given as vin 
$$(2782)^2$$
  
Rpus =  $\frac{2}{8}R_3$   
 $i = \frac{1}{\omega} = \frac{87}{83} = 4$   
The Rpus = 4.Rs  
 $= 44.(2500)^2$   
The Channel Resistance of enhancement models  
Can be given as  
 $Z = \frac{1}{\omega} = \frac{27}{87} = 1$   
Rpds =  $2\cdot Rs$   
 $Z = \frac{1}{\omega} = \frac{27}{87} = 1$   
Rpds =  $1\cdot Rs$   
 $= 1(10^4)$   
 $= 10K\Omega$   
The on channel Resistance of Names invester can  
be is Ron = Rpus + Rpds  
 $2500 = 2500$   
 $2510 + 2010 = 2610$ 

The sheet Resistance Concept can be extended for transistors, for example Consider two transistors as shown in figure a' and b' shown below. Dete 27 2 WILLIAM DW W SX Or a full of the full LIW 27:27 figure b' figure à' In the above I diagrams figure a is having a length of 27 and a width of 27 and figure b Carriex a length of \$2 with a width of 27. for a fig à the channel. Resistance can be calculated as R=ZRs 1203 ( where = ne Red. "airride 8) and and  $R = 1. (104)^{11}$  suffold to mis =10KSthe channel Resistance for thig b' can be calculated at monthing to envior point off to much the as sigR = Z · Rs mindrate music the mus pould  $z = \frac{L}{\omega} = \frac{81}{27} = 4$ tubulated televit  $R = 4 \cdot Rs$ = 4(104)R = YOKS2

5

Area Capacitance of Layens:,-In the Ic fabrication process, the layers Can be seperated from one another by a oxide Layer (i.e, insulating material) which is acting as dielectric medium between two parallel plates, so their may be a chance of availing Capacitance. The Capacitance c'an be given as In Allin a bac Es la dippol Cit Cit EA Divid o Him is to deput o usines ! Whene E = permitivity and Can be given as coloniated as K= 2 Ro  $e = e_0 e_{ins}$ Co -> Absolute permitivity (or) permitivity of free space. (8.854×102-foroday/meter) Eins -> Relative permitivity = 4 for silicon A = Area of plates lateral D = thickness of Sioz & bounds, all. \* Some of the typical Values of Capacitance for 5µm, our and 1.our technologies Can Are " tabulated below. tabulated below. X - 4. RS (POI)1 = 0.100

|                                                         |                             | Jalue in p<br>5,11m | 2.um -         | m <sup>2</sup> (Jelative Values in<br>brackets) |  |  |  |
|---------------------------------------------------------|-----------------------------|---------------------|----------------|-------------------------------------------------|--|--|--|
| gate to channel 4 (1.0) 8 (1.0) 16 (1.0)<br>capacitance |                             |                     |                |                                                 |  |  |  |
|                                                         | diffusion<br>(active)       | 1 (0.25)            | 1.75<br>(0.20) | 3.75 (0.23)                                     |  |  |  |
| 5.d                                                     | polysilicon<br>to substrate | 0.4 (0.1)           | 0.6 (0.075)    | 0.6 (0.038)                                     |  |  |  |
| -                                                       | Metal 1 to<br>Substrate     | 0.3 (0.075)         | 0·33<br>(0·04) | 0.33 (0.02)                                     |  |  |  |
|                                                         | Metal 2 to<br>Substrate     | 0.2<br>(0·5)        | 0.17<br>(0.02) | 0.17 (0.01)                                     |  |  |  |
|                                                         | Metal 2 to<br>Metal 1       | 0.4<br>(0.1)        | D.5<br>(0.06)  | (0.03)                                          |  |  |  |
|                                                         | Metal2 to<br>polysilicon    | 0.3<br>(0.075)      | 0.3<br>(0.038) | 0.3<br>(0.018)                                  |  |  |  |
| andard unit of Capacitance (IICg):-                     |                             |                     |                |                                                 |  |  |  |
| The standard unit of Capacitance can be define          |                             |                     |                |                                                 |  |  |  |

Standard Unit of Capacitonce (IICg):-The Standord unit of Capacitonce Can be define as gate to chonnel Capacitance of a most transistor having the feature Size of L=W. having the feature Size of L=W. Hebre, for example Standard unit of Capacitance these, for example Standard unit of Capacitance Can be Calculated for different technologies. Can be Calculated for different technologies. For 5µm technology:-The area per standard Square = 5µm ×5µm The area per standard Square = 5µm<sup>2</sup> The standard unit of Capacitance (DGg) can be given as ilicg = Areal standard Square & Capacitance www.Jnufastupdates.com Scanned by CamScanner

=> 25400 × 4×10 PF/402 => looxio PF Dg= 0.01pf tor sum technology Area/standard, square = Deim x2um = 411m2 The standard unit Capacitance (Dig) can be given as Dicg = Area/standard square & capacitanice  $= 4 \mu \sigma x 8 x 10. \rho F / \mu \sigma^2 \cdot . 0$ = 32×104PF = 0.00 g 2 p F (10) of closel for 1.aum technology Area/standard Square = 1.2 LIM X1.2 LIM .: (part) motion = 1.442 m 2 and horizon (?. The standard unit Capacitance (DCg) can be agiventager a for martingal lamos in imp a DCg = Area (standard Square & Capacitance) 1.44 um2 x 10 x10 PF/um2 = 23,04×104 PF = 0.023pf -: pobudate and of Some Area Capacitance Calculations: The this the selative Values of Capacitance Can be used for the Depenesentation of Capacitance and all values Carried out in 2 based rules.

- 1 -

3) For polysilicon The Capacitance for polysilicon Can be calculated as = Relative Area & Capacitance. - make - with at = 15×0.1 Dcg = 1.5 DCg . The polysilicon Capacitance is 1.5 times of Da . . . . with Delay unit (~):-The delay unit (~) . Can be given as the product of Sheet Resistance (Rs) and standard unit of gate Capacitance (□cg). Visite and a first  $\widetilde{r} = R_{S} \square c_{g}$ for 5um technology WRT  $\gamma = R_s \Box c_g$ ÷ 105 6rg  $= 10^{4} \times 0.01 \text{PF}$  $= 10^{4} \times 0.01 \times 10^{12} \times 0.01 \times 10^{12}$  $= 100 \times 10^{-12}$ material of state of the state = 0.1×10 Felaline and x could witch it ~= Dinsec for <u>sum</u> technology  $WKT \gamma = Rs \Box cg$  $= 3 10^{4} \times 32 10^{5} \text{ppg}^{3}$ = 64 pf

$$f_{1}^{*} = 64 \times 10^{10}$$

$$= 0.064 \times 10^{10}$$

$$f_{2}^{*} = 0.064 \times 10^{10}$$

$$f_{3}^{*} = 68 \tan^{2}_{3}$$

$$= 2 \times 10^{10} \times 23.04 \times 10^{10} \text{ pc}$$

$$f_{3}^{*} = 6.044 \text{ roc}$$

$$f_{3}^{*} = 0.046 \text{ roc}$$

$$f_{4}^{*} = 0.046 \text{ roc}$$

$$f_{3}^{*} = 0.046 \text{ roc}$$

$$f_{4}^{*} = 0.046 \text{ roc}$$

$$f_{5}^{*} = 0.046 \text{ ro$$

 $= 25 \times (10^{-6})^2 m^2$ 650 (0.01) 00 / NSEC X34 25×1012 Sec 65010.0123 afen er ' 25×1012 19.5 = 25 x10 3 x10 19.5  $= 3.5 \times 10^{-9}$ 19.5 × 10<sup>3</sup> = 0.128×109 ~ 0.13 risec Note: - The deloy unit q'is opproximately equal to the electron transit time itas. Inverter Deloyisate Latting mile of longing i mil These let us Consider an nons invester withan 1 8 8 8 8 B Vatio of 481. The pullup to pulldown statio of nmos driven by Rear in any another nmos is 481. 113 - 119 i.e.,  $\frac{ZpU}{Zpd} = \frac{4}{1}$ Zpo = 4, Zpd Zpu = XYRs



$$= \partial R_{S} \Box g (2+5).$$

$$T_{A} = \mp T$$
The overall deby of those inverter is  $\pm T$ .
The overall deby of those inverter is  $\pm T$ .
The overall deby can be better understood with a calculations of Rise time  $(T_{Y})$  and fall time  $(T_{F})$ .
Rise time (alculation  $(T_{Y})$ ).
The calculation of Rise time  $T_{g}$  can be done.
When input is logic b.
$$I_{ds} = K\omega \left[ V_{gs} - V_{t} \right]^{2}$$

$$I_{ds} = Bp \left[ V_{gs} - V_{t} p \right]^{2} \longrightarrow C_{t}$$

$$V_{out} = I_{ds} R \longrightarrow (2)$$

$$V_{out} = I_{ds} \cdot \frac{T_{Y}}{C_{L}}$$

$$V_{out} = \frac{T_{ds}}{T_{ds}} \cdot \frac{T_{Y}}{C_{L}}$$

$$V_{out} = \frac{V_{out} C_{L}}{T_{ds}}$$

$$T_{Y} = \frac{2 \text{ Vout} C_{\perp}}{\text{BP}[\text{Vgs} - 1|\text{Vtp}|]^{2}} \qquad \text{Vgs} = \text{Von}$$

$$\frac{\text{BP}[\text{Vgs} - 1|\text{Vtp}|]^{2}}{\text{Vtp} = 0.2\text{Vpb}}$$

$$T_{Y} = \frac{2 \text{ Voo} C^{\perp}}{\text{BP}[\text{ Voo} - 0.2\text{ Voo}]^{2}}$$

$$T_{T} = \frac{2 \text{ Voo} C^{\perp}}{\text{BP}[0.6\text{ Vob}]^{2}}$$

$$= \frac{2 \text{ C}^{\perp}}{\text{BP}(0.6\text{ Vob})}$$

$$= \frac{2 \text{ C}^{\perp}}{\text{BP}(0.6\text{ Vob})}$$

$$T_{Y} = \frac{3 \text{ C}_{\perp}}{\text{BP}(0.6\text{ Vob})}$$

Scanned by CamScanner

16

 $T_{\rm F} = \frac{F_{\rm P}}{B_{\rm P}}$ To a 1 BP and fall Time Tf & I \* The rise time 'Tr' and fall time 'Tf' both are dcl and d 1 Voo Driving large Capacitive loads: -\* The concept of Driving large capacitive loads may arise when the signals are propagating from on-chip to off-chip periphals, which are having Comparitively very large values. \* Here the load Copacitance 'ci' is equal orders > than the gate capacitance 'cg' i.e., CL>10'cg [Assumption]. \* Inorder to have decreased delay we need to maintain increased channel length which may furthur decrease the resistance. \* To drive large capacitive loads we are having 3 techniques. 1. Cascoded Connections of invertexs as drivers. a. Superbuffers as drivers. 3. Bi-cmos drivers. ortin 3 30

1. Cascaded Connections of Invertex & as drivers: \* when driving large capacitive loads to have a minimum delay we should have low resistances. \* low resistances can be obtained by having low ZpU & Zpd's 1:1 4:F 4:F<sup>2</sup> Vin Do Do Vo 1:F 1:F 1:F

\* The arrangement of Cascaded inverters as drivers is shown below. \* If we increase the width factor 'f' then the Load on capacitance may increased that results in larger capacitive area.

\* If there is increase in width factor then the resistance will get automatically decrease thereby we can have minimum delay. for nmos:- Delay Istage = If i for Avin =4fi for twin

Id = If i + 4f i = 5f i for cmas:- Delay | stage = of i for AVin = 5f i for Vin

 $\eta = \Im f \eta + \Im F \eta = \Im f \eta$ 

www.Jntufastupdates.com

The prelation blue no of inverters 'N' to width  
function 'f' can be given as  

$$N^{f} = \frac{CL}{Cg} = Y$$
  
 $N^{f} = Y$   
Apply 'log' on both sides  
 $-flog_{N} = Y$   
 $Apply 'log' on both sides
 $-flog_{N} = Y$   
 $N \log e = Y$   
 $N = Y$   
*If  $N = even$ , then the delay is  
 $T = \frac{N}{2} = 5fT = 2.5fT$  for amos  
 $T = \frac{N}{2} = 4fT = 3.5fT$  for amos  
 $T = \frac{N}{2} = 4fT = 3.5fT$  for amos  
 $T = [2.5(N-1) + 4]fT$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for amos  $T Vin$   
 $T = [2.5(N-1) + 5]fY$  for a brow  $T = 4$$ 

D. Juperbuffers as drivers :- It is classified into -two -types. 1. Investing type super buffers D. Non investing type Super buffers 1. Inverting type Superbuffers:-PUDD . . . . . T3 . TI oVo my s. C. W. Ty = GND \* when Vin is logic 'D' then the transistors T2 & Ty will get off and the transistor TigTz will get on. thereby producing olp as logic il in ca \* If Vin is logic'i' then the transistors Ti & T3 will get on and the transistors T2E, T4 will get on thereby producing olp as logic lo'. D. Non-Investing type Super buffeisi-\* when Vin is logic 'o' then the transistors, Ti & Ty will get on and the transistors T2 & T3 will get off thereby producing olp as logic 'o'.

when Vin is logic'i' then the transistors T2 & Tg will get on and the transistors Ti & Ty will get off and thereby producing logic '1' and all spic T3 housens TI Vol man Brings Ty Pipolas dech Hice Front GND g/1/19 Biconos Divers :the To Bipolar technology, output drive current is more for a given minimum Silicon area. 10 100 wollo]-2. In this, the transconductance gm and low current driving Copabilities and current/prea (or) more compared to Mos technology. \* in this the current Ic is exponentially related to technology it is having the Capability of the input Voltage. \* In this large Currents for the opplication of Smaller Voltages and the current through the device depends on the base width wis and the amount of doping level ill state doping level in square A simple stepsiesentation that is use bipolar technology to change their states is shown below.

\* The amount of time stequired to change the input is equal KS to amount of time taken Vbe to change the output. # The amount of time Can be given as the statio of load capacitance ccu to gate copositance (cg) for Mos technology At= SI \* In Bipolar technology the gate Capacitance's is replaced with transconductance gm. At= CL Servid compil gm \* Therefore the total time taken can be represe -nted as follows CHUMPHICH TI= Jin + (MI), CL / He / toi Urpen tor bipolar of whese intin - the inbuilt produced by HIME device To philidopho GL - Load Capacitance 10 mitoril-hfe - 1. Current amplification factor. The graphical suppresentation baf mos and bipolar technology are drown as below ; slope [(목) cL) is use hipolar 1bal Character Slope (4) CL hfe Tin weiger hours 11-11stoods of hearing

$$T = Tin + (\frac{1}{2}) cl for amos$$

$$Progration delays:-
To transfer logic levels from one place to another
place we are using Series of pass transistors in
place we are using Series of pass transistors are
use in Series Connection, the gate terminals are
tight together and this beingiven Von.
for example a series of four pass transistors
are shown below.
$$V_{DD} = V_{DD} = V_{DD} = V_{DD} = V_{DD}$$
Model for propagation delay
The equivalent circuit model can be drawn as
$$R = V_{1} = \frac{V_{1} - V_{2}}{R} = \frac{V_{2} - V_{2}}{R}$$
At node 2, we get write  $c \frac{dv_{2}}{R} = i_{1} - i_{2}$ 

$$C = \frac{dv_{2}}{dt} = \frac{V_{1} - V_{2}}{R} = \frac{V_{2} - V_{2}}{R}$$$$

$$\begin{aligned} & \mathcal{R}^{c} \frac{dv}{dt} = \frac{d^{3} \mathcal{W}}{dx^{2}} \\ & \therefore & \mathcal{R} \Rightarrow distance. \\ & \therefore & \text{The propagation delay the d x^{2}}. \\ & \text{For W no of networks the total Dissidance's } \\ & \text{Can be given as } \left[ \frac{R \text{total}}{R \text{total}} = N \text{tr} R_{S} \right] \\ & \text{tohester } \mathcal{R}_{S} = \text{sheel } \text{Resistance}. \\ & \text{The propagation delay } \mathcal{R}_{S} \\ & \text{tohester } \mathcal{R}_{S} = \text{sheel } \text{Resistance}. \\ & \text{The prelative Resistance}. \\ & \text{The total no of capacitance can be} \\ & \text{given as } \left[ \frac{C \text{total}}{C \text{total}} = N C \square C_{g} \right] \\ & \text{Wheste } C \\ & \text{Wheste } C \\ & \text{given as } \text{tr} = R \text{total } \times C \text{total} \\ & = N \text{tr} R_{S} \times N C \square C_{g} \\ & = N^{2} \text{tr} R_{S} \subset \square C_{g} \\ & = N^{2} \text{tr} R_{S} \square C_{g} \\ & = N^{2} \text{tr} R_{S} \square C_{g} \\ & \text{The notation } \left[ \frac{T}{T} = R_{S} \square C_{g} \right] \end{aligned}$$

The propagation delay is a N2- not in mind IF there is increase in N, it Diesults in The law increased propagation delay. fan-in and -fan-out characteristics :-Loe have two major factor that influence operational speed of a gate terminal these are fan-IN and fan-out. fan-1N: - The maximum no of inputs that are applied to driven in gate is called fan-IN Su' N 2011 parts 1.2. toutholds stations in the Fan-outs - The max no of i/p's that are applied to driven gate is called fan-out. \* The delay associated with For-in & Fan-out for three technologies is represented as ic sun Delay 10 gids 0.35.4m ship wide shia girs - -11 -00.74m 10: 0.5um Blicos not opplicable MULD ... 6 10 mad Fan-in & Fan-out characteristics

Choîce of loyens - hi polition initiality in . In designing circuits for our Convience 10F. Suitable specifications we have to consider several Number of Considerations which includes choice of ration ration and such as layesis it maintal- alight 1. Voo and Uss should be distributed on metal mile has internet and layer whenever possible. 2. The length of polysilicon should be use after Careful consideration because of Jelotively high Value of sheet Resistance. 3. The polysilicon is unsuitable for rooting VDDE Vss other than Small distances. 4. Capactive effects may also be consider because the diffusion stegions stelatively may have high 5N - (10) - PE-Capactive Values to the Substrate. liven pate is called for-Table for electrical roots:-. Max length of wires Loye 31 5um ) dru 1. Dum chip wide chip wide chip wide 1. Metal ---P. Silidide Not applicable Not applicable 2002 -le3. poly DOOD YOOUM 250 MM el 8 3 1 4. diffusion 6007 100.4m p. out church Fan-GOUM

choice of layers:-Mining Opportor capacitance "Resistance Laye & comment \* Good current Metal low low capability without Sold and to par Large Voltage drop and it is used for (disto) .01. Stand) power distributions 1 mil and global. 2. silicide Moderate Moderate \* It has Rc s minoro product has a 1 10 moderate value, long wires are connitatos applicable: this layer 95 useFul in place of poly (aparitomer: stol silicon in some blur one effects are Cases of nmos thigh thigh alphit it has RC 3. poly silicon Moderate product has IR higher than 12 1-001 01 a moderat and high der plique studie 4. diffusion \* RC product is High Moderate (Active) moderate and it acientité a la saint has moderate stortedue aquit à dias providence substante drive Smilonly, Partice segions may form junction with mean (or) o type of substate.

27

Wiring Capacitonce:-

are to vie !. We have Area Capacitonice contributed in the Calculation of overall calculation capacitonce. The Area capacitances are associated with the layesis to substrate and from gate to channel. well have three other Source for the Calculation of overall : Copacitance. showshul? shi ili-1. Inter loyer Capacitance D. perimeral (Junction) Capacitance 3. lifzing frietd) fringing frebs capacitance 1. Interlayer: Capacitance:parallel plate effects are present blue one layer to another layer. a toraball aesilis plan. for 'example, for a given area metal to polysilicon capacitance is higher than metal to Substrate Capacitance. D. peripheral Capacitonce -Haff. Joobsoffib.V 151/150 The source and drains of n-diffusion regions forms junctions with P-type substrate at uniform depth. Similarly, Pactive Regions may form junctions with n-well (08) n type of substrate.

\* for diffusion regions each diade thus form has associated with peripheral Capacitance which is measured to PF/unit length.

-the typical values for different -lechnologies given by

| diffusion      | Chem               | 1.1                |                    |
|----------------|--------------------|--------------------|--------------------|
| Capacitance    | 5.um               | Dum                | 1. Dum             |
| 1. CArea       | 1.082X10<br>Pf/um2 | 1.25×104<br>PF/4m2 | 1.7-X 10<br>PF/um2 |
| 2. Cperiphesal | 8x104<br>PF/21m2   | negligible         | neglijible         |
|                |                    |                    | 1                  |

3. fringing fields :-

Capacitance due to fringing field effect can be a mojor Component of overall Capacitance of inter connected wires.

frining field Capacitance can be of Same Order of area Capacitance.

The capacitance of frining field can be given as

$$C_{ff} = Eins E_0 \left\{ \frac{\pi}{ln \left\{ 1 + \frac{\partial d}{E} \left( 1 + \sqrt{\frac{E}{d}} \right) \right\}} - \frac{t}{4d} \right\}$$

where l = length of the wire t = -thicknessi of the wire

\* The total wire capacitance, be given as l'é mémorarea + Cififih al poulou longit als Where Carea = Mind Capacitance, init station of 2 Craipheral Runol Frint Proj s-haging -licklesapacitares due to tringing field effect and mojes (engrant of overall apacitors of inter Saria batar fucing field in a contance took to of some

30

3.2 Scaling of Mos Ckts 1119 Micro electronic technology can be characterised with the help of several indicators (06) figure of merits which includes 1. No of transistors perchip 2. Minimum feature size 3. power disspiration 4. max operational frequency 5. die size 6. production cost \* mony of these fig of merits can be improved by. reducing dimensions of transistors, inter connections. and separation blue features and by adjusting doping level and Supply Voltage. Scaling Models and Scaling factors:-Basically we are having two Scaling models 1. Constant electric field scaling model a. constant vollage scaling model In accordance with these two Scaling models we are having a special scaling model which is the Combination of both Scaling models stated above and is called as combined voltage above and dimension Scaling model. The following fig indicates that substrate doping lævel which are associated with Scaling of transistors.

the start of a \* To scale any parameter we are using two scaling -factors as 1 & - B \* 1 is used for supply voltage levels (Vdd) and for gate oxide thickness (D) for all other linear dimensions we use 1 as a scaling factor for both horizontal and Vertical dimensions. Note:- For Constant electric field Scaling model we use [B=d] and for Constant Noltage scaling model B=1 : the to mail and an ilm". dtal op suato spot at bandants 11 Scaling factors for device parameters:-1. Gate area (Ag):- Ag = LXW poisable 'L' is the length of the channel which is . 26 mainter A Scalled by Ya

and 'w' is the width of channel which is Scaled  
by 1/d  

$$M = LXW$$
  
 $= \frac{1}{a} \cdot \frac{1}{a}$   
 $A = -\frac{1}{a} = \frac{1}{a}$   
 $A = -\frac{1}{a} = \frac{1}{a}$   
 $A = -\frac{1}{a} = \frac{1}{a}$   
 $A = Area + \frac{1}{a} = \frac{1}{a}$   
 $A = \frac{1}{a}$   
 $A$ 

$$\begin{aligned} \rho_{ON} &= \beta \cdot \frac{1}{\beta} = 1 \\ 6 \cdot channel \quad on \underline{Resistance} \\ \overline{Row} &= \frac{1}{Lw} = \frac{1/a}{\sqrt{a}} = 1 \\ 7 \cdot gate \quad delay (\gamma_{a}): \\ Td &= Row \\ g &= 1 \cdot \frac{\beta}{a^{2}} \\ = \frac{\beta}{a^{2}} \\ 8 \cdot Haximum \quad operating \quad frequency \quad (fo): \\ fo &= \frac{\omega}{L} \quad \frac{\mu \cos \sqrt{a}}{G} = \frac{\beta \cdot \frac{1}{\beta}}{\frac{\beta}{\beta}^{2}} = \frac{\beta}{\beta} \frac{1}{a^{2}} \\ = \frac{\alpha^{2}}{\beta} \\ 9 \cdot Saturation \quad Current \quad (Tds): - \\ Tds &= \frac{K\omega}{L} \quad \left[\frac{\sqrt{a}s - \sqrt{L}}{\beta}\right]^{2} \\ &= 1 \left[\frac{1}{\beta} - \frac{1}{\beta}\right]^{2} \\ = \frac{1}{\beta} \frac{1}{\beta^{2}} \\ = \frac{1}{\beta} \\ Tds &= \frac{1}{\beta} \frac{1}{\beta^{2}} \end{bmatrix} \end{aligned}$$

Scanned by CamScanner

.

$$W = \frac{Urrent}{J} = \frac{J}{ds} = \frac{J/p}{Ja^{2}}$$

$$= \frac{d^{2}/p}{Ja^{2}}$$

$$= \frac{d^{2}/p}{ds} = \frac{J}{ds} = \frac{J/p}{Ja^{2}}$$

$$= \frac{d^{2}/p}{ds} = \frac{J}{ds} = \frac{J}{ds} = \frac{J}{ds} = \frac{J}{ds}$$

$$= \frac{J}{ds} = \frac{J/p^{2}}{p} = \frac{J/p^{2}}{Ron}, \quad R_{gd} = \frac{J}{p} = \frac{J}{p}$$

$$R_{g} = \frac{J/p^{2}}{p} = \frac{J/p^{2}}{p} = \frac{J/p^{2}}{p} = \frac{J/p^{2}}{p}$$

$$R_{g} = \frac{J/p^{2}}{p} = \frac{J/p^{2}}{p^{2}} = \frac{J/p^{2}}{p}$$

$$R_{g} = \frac{J/p^{2}}{p} = \frac{J/p^{2}}{p^{2}} = \frac{J/p^{2}}{p}$$

$$R_{g} = \frac{J/p^{2}}{p^{2}} = \frac{J/p^{2}}{p^{2}}$$

$$R_{g} = \frac{J/p^{2}}{p^{2}}$$

$$R_{g}$$

1

Scanned by CamScanner

1

Scaling effects:-OF PLANE IN THE S.NO Parameter Constant Constant Combined Vollage electric Voltage Scaling field and dimension model model model B=d B=1 1. Supply Voltage (Vdd) 1/2 1/B 1 2. Channel length 1/2 1/2 1/2 (1) 3. width of the 1/2 1/div Channel (W) 4. gate oxide LS /B thickness (D) 5. gate Area (Ag) 1/22 6. gate capacitance Y K 1111 Per unit area (c) 7. gate capacitance B/a2 12 - X  $(c_{g})$ a2 8. parastic Capacitance x .... 9. Carrier density in the channel n (QON) Channel ON 10. Resistance B/22 11. gate delay Va 1/22 12. Maximum opera  $d^2/\beta$  $\alpha^2$ d -ting frequency 36

Saturation Current Current density No parts 1B Va 14. d 2 a2/Billing 5. Switching energy per gate 1/d -/d B /d 3 power disspation per gate 1/B2 1/22 16 they distant A. power disspation a2 x2/B2 per unit area 1/2°B . 1/23 18. Power speed 1/d2 product a) i light man i hai man i ble arabe materia anna sinta Limitation's of Scalling: - all some in it So far we are discussed about Various effects, we have neglected built in (junction) potential VB which inturned depends on substrate doping level and this is acceptible so long as VB is Smaller compored to MPD: 10310 2 in anti-Substrate doping Scalling factors:-As the length of the channel of a mos transistor is steduce, that depletion region width also to be Scalled down to prevent & Source and drain depletions Jugions from meeting: instit we'r The depletion width d'for the junction can be given as W MEEN d= V & EO Eins VB QNB AV + BUTT - V

where 
$$q = charge$$
  
 $G_0 = permitivity of fire Space$   
 $f_{ins} = permitivity of molecial$   
 $V_{B} = V$ , built, in potential  
 $V_{B} = V$ , built, in potential  
The inbuilt potential  $V_{B}$  can be diver as  
 $V_{B} = \frac{KT}{V} \ln \left(\frac{N_{D}N_{B}}{N_{T}^{2}}\right)$   
where  $N_{D} = chain (oi)$  Source doping level  
 $n_{i} = intrinsic Carries Concentration
* If we increase  $N_{B}$  to Dieduce d, at the  
Same time  $V_{B}$  is also increase  
* For Combined Voltage & dimension modes  
the total applied Voltage. Can be diver as  
 $V_{a} = mV_{B}$   
where m is a Diead number  
 $V_{a} = mV_{B}$   
 $V_{a}$$ 

V= mVB+ BVB Β,  $V_{q} = \frac{V_{B}(m+B)}{B}$ . The effective Scale voltage can be given as  $V_S = \frac{V_2}{V_1}$  $= \frac{V_{B'}(m+\beta)}{\beta}$ VB (MIV)  $V_{S} = \frac{(m+\beta)}{\beta(m+1)}$ Pulite Limitations due to Sub threshold Currents (Isub):one of the major concerts in the scalling of devices is the effect of Sub threshold Current Isub which Can be given as -sub & e (Vgs-VE) KT/q/ limitad o when a transistor is in of offstate, the value of vgs-ve is negative and it should be as large as possible to minimize Isub As the voltages are scalled down then the Statio of Vgs - VE to k They will steduce so that Sub threshold increase.

\* For this reason, it may be describble to seal both ugs EVE by a factor 1/b > 1/a. Since a'is generally greater than b. \* The maximum electric field across the depletion Stegion can be given as  $t_{max} = \frac{2V}{d} = 2(v_a - V_b)$ dinial \* The junction breakdown Voltage can be given as  $BV = Eo Eins E^2$ 29NB Noter-Extra Care is therefore. require in estimating the breakdown voltages for Scalled devices. the Electric field are greater and breakdown Voltage is greater at the Corpern of diffusion regions underlying Sioz: Limitations on logic levels and Supply voltages due to Noise :-The major advantages in Scalling of devices are Smaller gate delay time ile, higher operating frequeny and Lower internal power Consumption. + The lowering of interface spacing and higher Switching increase noise in VLSI chips. so noise may also be amplified and is thus a major Concern.

40

\*The mean Square Current -fluxations in the channel Can be given as  $i = 4kT RagmAf \longrightarrow(1)$ where Rn = Noise Resistance AF = Band width Jm = BVp Vp = pinch off Voltage The equivalent Resistance Rn Can be given as  $kn = \left( \frac{1}{2} \frac{v_{\beta}}{v_{\beta}} + \frac{1}{6} \right) \longrightarrow (2)$ gm where vg1 = vgs - Vt+VB 221.87 and a some Vp = Vp + VB Similarly the thermal equivalent Rngm' can be given as be given as  $\operatorname{Rngm} = \frac{1}{2} \frac{(v_{gs} - V_E + v_B)}{(v_{p+v_B})} + \frac{1}{6}$ ->+3) Observing earn(3) the value 'Rngm' is also dependent on vy but very small extent. to The modified expression for Rngm when tox Scalled as  $Rngm = \frac{1}{2} \left[ \frac{Vg}{Vg} - \frac{1}{2} \left( \frac{a}{\cos x} \right)^2 \left( \frac{1 \pm \frac{vg}{\cos x}}{2} \right)^{1/2} - \frac{1}{6} \right] \frac{1}{6}$ 

# IF their is an increase in the value of cox then Rngm decreases by a Small amount which inturn decrease the ratio of logiclevely to thermal noise by same Amount 29/1/19 Switch logic :-Switch logic is based on pass transistors lor) on the transmission gates. \* This approach is fast for small arrays and takes no static current from supply rails. \* Hence power dissipation such arrays is small. . Current flow only on Switching. \* pass transistor logic is similar to logic arrays based on relayed contacts in gu \* The basic AND Connections are Setout as shown in fig below but many combinations of switch are possible.A No=Vin when ABCD=1 . Here up logic levels will be degraded by VI effects. Vin



logic levels are not degraded by 'Ve' effect. Pass transistor and Transmission gates:-Switches and Switch logics may be form from Simple n' (00) p-type pass transistors in parallel as shown in figure below.

www.Jntufastupdates.com

> Transmission Jak olp 0/1 logic level with good ilp. Transmission fate Symbol:-VDD VDD-VEP 109rc 1. VDD-2VEP logic -1 laws of logic-1, if the gate of pass transistor is driven from another pass transistor. -fig: - Some properties of pass teansistor and some logic families. Alternative Gate Circuits (or.) Gate Logic:-CHOS Circuits Suffer from increased area and Corresponding increasing Capacitance and delays logic gates becomes more complicate. For this Reason, the designers develops the Circuits that can be used to suppliment the Complimentary Circuits. There are not supposed to supposed to supposed to supposed to Can be used for some Special Purposes. toe have Several alternative gate logic circuits as listed below,

44

1) pseudo nos (2) Dynamic (3) cmos (clocked cmos) 14) Domino Iggic (s) np cmos logic pseudo nos logici-The pseudo nmos logic is one of the type of iternative gate circuits i.e., use to supliment for mos circuits. In this pseudo nmos circuit the depletion mode pull up mos transistor is steplaced with p-mos transistor whose gate terminal is always ground. YDD Implementation of 3 Input. NAND gate & NOR gates Implementation of three i/p NOR gate and Nand gate wing pseudo nmos logic:-TUDD Leon a dind of B-Uss -VSS

Note: - For 'n' numbers of i/p psuedo nonos logic' requires 'n+1' numbers of transistors (mos logic require 2n' number of transistor. Designed A. M. Martin The actual logic is implemented in the n-block on Dynamic Cmos logici-P transistor is used for non-time critical pre-charging j.e The output Capacitance is charged to VDD during mar 24 output. off period of clock signal (\$). During this time, inputs due applied ton-block and state of logic is then evaluated during on period of clock when the bottom n-transistor is in on position. TVDD Dynamic clock n-clock Signal no contestances off instruction of there is period many straight straight VSS Clocked Cmos logicu-The logic is implemented in both n and p transistors. in the form of pullup p-block and complimentary pull down n-block.

46

The logic in this Can is evaluated only during . The on period of clock

Domino cross logic: An extremsion of dynamic cross logic is called domino cross logic. this is an modified arrangement that allows cascading of logic structures using only a single phase clock. Sol, at the output we use a buffer. TVD



www.Jntufastupdates.com

Scanned by CamScanner

47

<u>"permos logici-</u> This is another version of basic dynamic, logic Circuit.

The actual logic blocks n and p are arranged in the alternative format in cascade structure. One block is fed with clock signal (\$) and another block is fed with clock signal (\$).



www.Jntufastupdates.com



Figure 3.1 Input-output characteristic of a nonlinear system.

approximation, and higher order terms are insignificant. In other words,  $\Delta y = \alpha_1 \Delta x$ , indicating a linear relationship between the *increments* at the input and output. As x(t) increases in magnitude, higher order terms manifest themselves, leading to nonlinearity and necessitating large-signal analysis. From another point of view, if the slope of the characteristic (the incremental gain) varies with the signal level, then the system is nonlinear. These concepts are described in detail in Chapter 13.

What aspects of the performance of an amplifier are important? In addition to gain and speed, such parameters as power dissipation, supply voltage, linearity, noise, or maximum voltage swings may be important. Furthermore, the input and output impedances determine how the circuit interacts with preceding and subsequent stages. In practice, most of these parameters trade with each other, making the design a multi-dimensional optimization problem. Illustrated in the "analog design octagon" of Fig. 3.2, such trade-offs present many challenges in the design of high-performance amplifiers, requiring intuition and experience to arrive at an acceptable compromise.



Figure 3.2 Analog design octagon.

# 3.2 Common-Source Stage

# 3.2.1 Common-Source Stage with Resistive Load

By virtue of its transconductance, a MOSFET converts variations in its gate-source voltage to a small-signal drain current, which can pass through a resistor to generate an output voltage. Shown in Fig. 3.3(a), the common-source (CS) stage performs such an operation.





We study both the large-signal and the small-signal behavior of the circuit. Note that the input impedance of the circuit is very high at low frequencies.

If the input voltage increases from zero,  $M_1$  is off and  $V_{out} = V_{DD}$  [Fig. 3.3(b)]. As  $V_{in}$  approaches  $V_{TH}$ ,  $M_1$  begins to turn on, drawing current from  $R_D$  and lowering  $V_{out}$ . If  $V_{DD}$  is not excessively low,  $M_1$  turns on in saturation, and we have

$$V_{out} = V_{DD} - R_D \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_{in} - V_{TH})^2, \qquad (3.3)$$

where channel-length modulation is neglected. With further increase in  $V_{in}$ ,  $V_{out}$  drops more and the transistor continues to operate in saturation until  $V_{in}$  exceeds  $V_{out}$  by  $V_{TH}$  [point A in Fig. 3.3(b)]. At this point,

$$V_{inj} - V_{TH} = V_{DD} - R_D \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_{inj} - V_{TH})^2, \qquad (3.4)$$

from which  $V_{in1} - V_{TH}$  and hence  $V_{out}$  can be calculated. For  $V_{in} > V_{in1}$ ,  $M_1$  is in the triode region:

$$V_{out} = V_{DD} - R_D \frac{1}{2} \mu_n C_{ox} \frac{W}{L} \left[ 2(V_{in} - V_{TH}) V_{out} - V_{out}^2 \right].$$
(3.5)

shra- Uni)Vail

(3.7)

If  $V_{in}$  is high enough to drive  $M_1$  into deep triode region,  $V_{out} \ll 2(V_{in} - V_{TH})$ , and, from the equivalent circuit of Fig. 3.3(c),

$$V_{out} = V_{DD} \frac{R_{on}}{R_{on} + R_D}$$
(3.6)

$$\frac{V_{DD}}{1 + \mu_n C_{ox} \frac{W}{L} R_D (V_{in} - V_{TH})}$$

Since the transconductance drops in the triode region, we usually ensure that  $V_{out} > V_{in} - V_{TH}$ , operating to the left of point A in Fig. 3.3(b). Using (3.3) as the input-output characteristic and viewing its slope as the small-signal gain, we have:

-

$$A_{v} = \frac{\partial V_{out}}{\partial V_{in}}$$
(3.8)

$$= -\frac{R_D}{R_D} \mu_n C_{ox} \frac{W}{L} (V_{in} - V_{TH}), \qquad (3.9)$$

$$A_{V} = -g_{m}R_{D}$$
(3.10)

This result can be directly derived from the observation that  $M_1$  converts an input voltage change  $\Delta V_{in}$  to a drain current change  $g_m \Delta V_{in}$ , and hence an output voltage change  $-g_m R_D \Delta V_{in}$ . The small-signal model of Fig. 3.3(d) yields the same result.

Even though derived for small-signal operation, the equation  $A_v = -g_m R_D$  predicts certain effects if the circuit senses a large signal swing. Since  $g_m$  itself varies with the input signal according to  $g_m = \mu_n C_{ox}(W/L)(V_{GS} - V_{TH})$ , the gain of the circuit changes substantially if the signal is large. In other words, if the gain of the circuit varies significantly with the signal swing, then the circuit operates in the large-signal mode. The dependence of the gain upon the signal level leads to nonlinearity (Chapter 13), usually an undesirable effect.

A key result here is that to minimize the nonlinearity, the gain equation must be a weak function of signal-dependent parameters such as  $g_m$ . We present several examples of this concept in this chapter and in Chapter 13.

### Example 3.1.

Sketch the drain current and transconductance of  $M_1$  in Fig. 3.3(a) as a function of the input voltage.

### Solution

The drain current becomes significant for  $V_{in} > V_{TH}$ , eventually approaching  $V_{DD}/R_D$  if  $R_{on1} \ll R_D$  [Fig. 3.4(a)]. Since in saturation,  $g_m = \mu_n C_{ox}(W/L)(V_{in} - V_{TH})$ , the transconductance begins to rise for  $V_{in} > V_{TH}$ . In the triode region,  $g_m = \mu_n C_{ox}(W/L)V_{DS}$ , falling as  $V_{in}$  exceeds  $V_{in1}$  [Fig. 3.4(b)].

How do we maximize the voltage gain of a common-source stage? Writing (3.10) as

$$A_{\nu} = -\sqrt{2\mu_n C_{ox} \frac{W}{L} l_D} \frac{V_{RD}}{l_D}, \qquad (3.11)$$





Figure 3.4

where  $V_{RD}$  denotes the voltage drop across  $R_D$ , we have

$$A_{v} = -\sqrt{2\mu_{n}C_{ox}}\frac{W}{L}\frac{V_{RD}}{\sqrt{I_{D}}}.$$
(3.12)

Thus, the magnitude of  $A_v$  can be increased by increasing W/L or  $V_{RD}$  or decreasing  $I_D$  if other parameters are constant. It is important to understand the trade-offs resulting from this equation. A larger device size leads to greater device capacitances, and a higher  $V_{RD}$  limits the maximum voltage swings. For example, if  $V_{DD} - V_{RD} = V_{in} - V_{TH}$ , then  $M_1$  is at the edge of the triode region, allowing only very small swings at the output (and input). If  $V_{RD}$ remains constant and  $I_D$  is reduced, then  $R_D$  must increase, thereby leading to a greater time constant at the output node. In other words, as noted in the analog design octagon, the circuit exhibits trade-offs between gain, bandwidth, and voltage swings. Lower supply voltages further tighten these trade-offs.

For large values of  $R_D$ , the effect of channel length modulation in  $M_1$  becomes significant. Modifying (3.4) to include this effect,

$$V_{out} = V_{DD} - R_D \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_{in} - V_{TH})^2 (1 + 2V_{out}), \qquad (3.13)$$

we have

$$\frac{\partial V_{out}}{\partial V_{in}} = -R_D \mu_n C_{ox} \frac{W}{L} (V_{in} - V_{TH}) (1 + \lambda V_{out}) - R_D \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_{in} - V_{TH})^2 \frac{\partial V_{out}}{\partial V_{in}}.$$

Using the approximation  $I_D \approx (1/2)\mu_n C_{ox}(W/L)(V_{in} - Y_{TH})^2$ , we obtain:

$$A_{v} = -R_{D}g_{m} - R_{D}I_{D}\lambda A_{v}$$
(3.15)  

$$A_{v} + R_{D}I_{D}\lambda A_{v} = -R_{D}g_{m}$$

$$A_{v} \left[ I + R_{D}I_{D}\lambda \right] = -R_{D}g_{m}$$

$$A_{v} \left[ I + R_{D}I_{D}\lambda \right] = -R_{D}g_{m}$$

51

(3.14)

Chap. 3

### Single-Stage Amplifiers

and hence

Since 
$$\lambda I_{D} = 1/r_{O}$$
,  

$$A_{v} = -\frac{g_{m}R_{D}}{1+R_{D}\lambda I_{D}} = -\frac{9r_{0}R_{D}}{1+R_{D}\lambda \left[\frac{1}{6\delta}\right]}$$

$$A_{v} = -g_{m}\frac{r_{0}R_{D}}{r_{0}+R_{D}}$$
(3.16)  
(3.17)

The small-signal model of Fig. 3.5 gives the same result with much less effort. That is, since



Figure 3.5 Small-signal model of CS stage including the transistor output resistance.

 $g_m V_1(r_0 || R_D) = -V_{out}$  and  $V_1 = V_{in}$ , we have  $V_{out}/V_{in} = -g_m(r_0 || R_D)$ . Note that, as mentioned in Chapter 1,  $V_{in}$ ,  $V_1$ , and  $V_{out}$  in this figure denote small-signal quantities.

### Example 3.2

Assuming  $M_1$  in Fig. 3.6 is biased in saturation, calculate the small-signal voltage gain of the circuit.



#### Solution

a depend the section of the s

Since  $I_1$  introduces an infinite impedance, the gain is limited by the output resistance of  $M_1$ :

$$A_v = -g_m r_O. \tag{3.18}$$

Called the "intrinsic gain" of a transistor, this quantity represents the maximum voltage gain that can be achieved using a single device. In today's CMOS technology,  $g_m r_0$  of short-channel devices is between roughly 10 and 30. Thus, we usually assume  $1/g_m \ll r_0$ .

In Fig. 3.6, Kirchhoff's current law (KCL) requires that  $I_{D1} = I_1$ . Then, how can  $V_{in}$  change the current of  $M_1$  if  $I_1$  is constant? Writing the total drain current of  $M_1$  as

$$I_{D1} = \frac{1}{2} \mu_n C_{ox} (V_{in} - V_{TH})^2 (1 + \lambda V_{out})$$
  
= I<sub>1</sub>, (3.19)

(3.20)

we note that  $V_{in}$  appears in the square term and  $V_{out}$  in the linear term. As  $V_{in}$  increases,  $V_{out}$  must decrease such that the product remains constant. We may nevertheless say " $I_{D1}$  increases as  $V_{in}$  increases." This statement simply refers to the quadratic part of the equation.

### 3.2.2 CS Stage with Diode-Connected Load

In many CMOS technologies, it is difficult to fabricate resistors with tightly-controlled values or a reasonable physical size (Chapter 17). Consequently, it is desirable to replace  $R_D$  in Fig. 3.3(a) with a MOS transistor.

A MOSFET can operate as a small-signal resistor if its gate and drain are shorted [Fig. 3.7(a)]. Called a "diode-connected" device in analogy with its bipolar counterpart,



Figure 3.7 (a) Diode-connected NMOS and PMOS devices, (b) small-signal equivalent circuit.

this configuration exhibits a small-signal behavior similar to a two-terminal resistor. Note that the transistor is always in saturation because the drain and the gate have the same potential. Using the small-signal equivalent shown in Fig. 3.7(b) to obtain the impedance of the device, we write  $V_1 = V_X$  and  $I_X = V_X/r_0 + g_m V_X$ . That is, the impedance of the diode is simply equal to  $(1/g_m) || r_0 \approx 1/g_m$ . If body effect exists, we can use the circuit in Fig. 3.8 to write  $V_1 = -V_X$ ,  $V_{bs} = -V_X$  and



Figure 3.8 (a) Arrangement for measuring the equivalent resistance of a diodeconnected MOSFET, (b) small-signal equivalent circuit.

$$(g_m + g_{mb})V_X + \frac{V_X}{r_0} = I_X.$$
 (3.21)

It follows that

(3.22), (3.23) (3.24)  $\frac{V_X}{I_X} = \frac{1}{g_m + g_{mh} + r_o^{-1}}$  $=\frac{1}{g_m+g_{mb}}\|r_O$  $\approx \frac{1}{g_m + g_{mb}}.$ 

Interestingly, the impedance seen at the source of  $M_1$  is lower when body effect is included. Intuitive explanation of this effect is left as a set. Intuitive explanation of this effect is left as an exercise for the reader.

We now study a common-source stage with a diode-connected load (Fig. 3.9). For negli-ple channel-length modulation. (3.24) cont gible channel-length modulation, (3.24) can be substituted in (3.10) for the load impedance,



V<sub>DD</sub>

yielding

MAY TO

and set

a state 07/36

Sti Line

(3.25)  $A_v = -g_{m1} - \frac{1}{2}$ 

$$g_{m2} + g_{mb2}$$

$$= -\frac{g_{m1}}{g_{m2}} \frac{1}{1+n},$$
(3.26)

where  $\eta = g_{mb2}/g_{m2}$ . Expressing  $g_{m1}$  and  $g_{m2}$  in terms of device dimensions and bias currents, we have currents, we have

$$A_{\nu} = -\frac{\sqrt{2\mu_n C_{ox}(W/L)_1 I_{D1}}}{\sqrt{2\mu_n C_{ox}(W/L)_2 I_{D2}}} \frac{1}{1+\eta},$$
(3.27)

and, since  $I_{D1} = I_{D2}$ ,

2153 18 14

$$A_{y} = -\sqrt{\frac{(W/L)_{1}}{(W/L)_{2}}} \frac{1}{1+\eta}.$$
(3.28)

This equation reveals an interesting property: if the variation of  $\eta$  with the output voltage is neglected, the gain is independent of  $\eta$  with the output voltage is neglected, the gain is independent of the bias currents and voltages (so long as  $M_1$  stays in saturation). In other words, as the inin saturation). In other words, as the input and output signal levels vary, the gain remain<sup>5</sup> relatively constant, indicating that the input-output characteristic is relatively linear.

The linear behavior of the circuit can also be confirmed by large-signal analysis. Neglecting channel-length modulation for simplicity, we have in Fig. 3.9

$$\frac{1}{2}\mu_n C_{ox} \left(\frac{W}{L}\right)_1 (V_{in} - V_{TH1})^2 = \frac{1}{2}\mu_n C_{ox} \left(\frac{W}{L}\right)_2 (V_{DD} - V_{out} - V_{TH2})^2, \quad (3.29)$$

and hence

$$\sqrt{\left(\frac{W}{L}\right)_{1}}(V_{in} - V_{TH1}) = \sqrt{\left(\frac{W}{L}\right)_{2}}(V_{DD} - V_{out} - V_{TH2}).$$
 (3.30)

5

с с

(3.33)

Thus, if the variation of  $V_{TH2}$  with  $V_{out}$  is small, the circuit exhibits a linear input-output characteristic. The small-signal gain can also be computed by differentiating both sides with respect to  $V_{in}$ :

$$\sqrt{\left(\frac{W}{L}\right)_{1}} = \sqrt{\left(\frac{W}{L}\right)_{2}} \left(-\frac{\partial V_{out}}{\partial V_{in}} - \frac{\partial V_{TH2}}{\partial V_{in}}\right), \qquad (3.31)$$

which, upon application of the chain rule  $\partial V_{TH2}/\partial V_{in} = (\partial V_{TH2}/\partial V_{out})(\partial V_{out}/\partial V_{in}) = \eta(\partial V_{out}/\partial V_{in})$ , reduces to

$$\frac{\partial V_{out}}{\partial V_{in}} = -\sqrt{\frac{(W/L)_1}{(W/L)_2}} \frac{1}{1+\eta}.$$
(3.32)

It is instructive to study the overall large-signal characteristic of the circuit as well. But let us first consider the circuit shown in Fig. 3.10(a). What is the final value of  $V_{out}$  if  $I_1$  drops to zero? As  $I_1$  decreases, so does the overdrive of  $M_2$ . Thus, for small  $I_1$ ,  $V_{GS2} \approx V_{TH2}$ and  $V_{out} \approx V_{DD} - V_{TH2}$ . In reality, the subthreshold conduction in  $M_2$  eventually brings  $V_{out}$  to  $V_{DD}$  if  $I_D$  approaches zero, but at very low current levels, the finite capacitance at the output node slows down the change from  $V_{DD} - V_{TH2}$  to  $V_{DD}$ . This is illustrated in the time-domain waveforms of Fig. 3.10(b). For this reason, in circuits that have frequent switching activity, we assume  $V_{out}$  remains around  $V_{DD} - V_{TH2}$  when  $I_1$  falls to small values.

Now we return to the circuit of Fig. 3.9. Plotted in Fig. 3.11 versus  $V_{in}$ , the output voltage equals  $V_{DD} - V_{TH2}$  if  $V_{in} < V_{TH1}$ . For  $V_{in} > V_{TH1}$ , Eq. (3.30) holds and  $V_{out}$  follows an approximately straight line. As  $V_{in}$  exceeds  $V_{out} + V_{TH1}$  (beyond point A),  $M_1$  enters the triode region, and the characteristic becomes nonlinear.

The diode-connected load of Fig. 3.9 can be implemented with a PMOS device as well. Shown in Fig. 3.12, the circuit is free from body effect, providing a small-signal voltage gain equal to

$$A_v = -\sqrt{\frac{\mu_n(W/L)_1}{\mu_p(W/L)_2}},$$

where channel-length modulation is neglected.



Figure 3.10 (a) Diode-connected device with stepped bias current, (b) variation of source voltage versus time.



Figure 3.11 Input-output characteristic of a CS stage with diode-connected load.



Figure 3.12 CS stage with diodeconnected PMOS device.

Equations (3.28) and (3.33) indicate that the gain of a common-source stage with diodeconnected load is a relatively weak function of the device dimensions. For example, to achieve a gain of  $10_{l} \mu_n(W/L)_1/[\mu_p(W/L)_2] = 100$ , implying that, with  $\mu_n \approx 2\mu_p$ , we must have  $(W/L)_1 \approx 50(W/L)_2$ . In a sense, a high gain requires a "strong" input device and a "weak" load device. In addition to disproportionately wide or long transistors (and hence a large input or load capacitance), a high gain translates to another important limitation: reduction in allowable voltage swings. Specifically, since in Fig. 3.12,  $I_{D1} = |I_{D2}|$ ,

$$\mu_n \left(\frac{W}{L}\right)_1 (V_{GS1} - V_{TH1})^2 \approx \mu_p \left(\frac{W}{L}\right)_2 (V_{GS2} - V_{TH2})^2, \qquad (3.34)$$

revealing that

$$\frac{|V_{GS2} - V_{TH2}|}{|V_{GS1} - V_{TH1}|} \approx A_v.$$
(3.35)

In the above example, the overdrive voltage of  $M_2$  must be 10 times that of  $M_1$ . For example, with  $V_{GS1} - V_{TH1} = 200$  mV, and  $|V_{TH2}| = 0.7$  V, we have  $|V_{CS2}| = 2.7$  V, severely limiting the output swing. This is another example of the trade-offs suggested by the analog design octagon. Note that, with diode-connected loads, the swing is constrained by both the required overdrive voltage and the threshold voltage. That is, even with a small overdrive, the output level cannot exceed  $V_{DD} - |V_{TH}|$ .

An interesting paradox arises here if we write  $g_m = \mu C_{ox}(W/L)|V_{GS} - V_{TH}|$ . The voltage gain of the circuit is then given by

$$A_v = \frac{g_{m1}}{g_{m2}} \tag{3.36}$$

$$=\frac{\mu_n C_{ox}(W/L)_1 (V_{GS1} - V_{TH1})}{\mu_p C_{ox}(W/L)_2 |V_{GS2} - V_{TH2}|}.$$
(3.37)

Equation (3.37) implies that  $A_v$  is *inversely* proportional to  $|V_{GS2} - V_{TH2}|$ . It is left for the reader to resolve the seemingly opposite trends suggested by (3.35) and (3.37).

### Example 3.3.

In the circuit of Fig. 3.13,  $M_1$  is biased in saturation with a drain current equal to  $I_1$ . The current source  $I_S = 0.75I_1$  is added to the circuit. How is (3.35) modified for this case?

Solution

Since  $|I_{D2}| = I_1/4$ , we have

$$A_v \approx -\frac{g_{m1}}{g_{m2}} \tag{3.38}$$

$$= -\sqrt{\frac{4\mu_n(W/L)_1}{\mu_p(W/L)_2}}.$$
(3.39)



Figure 3.13

Moreover,

$$\mu_n \left(\frac{W}{L}\right)_1 (V_{GS1} - V_{TH1})^2 \approx 4\mu_p \left(\frac{W}{L}\right)_2 (V_{GS2} - V_{TH2})^2, \qquad (3.40)$$

yielding

$$\frac{|V_{GS2} - V_{TH2}|}{|V_{GS1} - V_{TH1}|} \approx \frac{A_v}{4}.$$
(3.41)

Thus, for a gain of 10, the overdrive of  $M_2$  need be only 2.5 times that of  $M_1$ . Alternatively, for a given overdrive voltage, this circuit achieves a gain four times that of the stage in Fig. 3.12. Intuitively, this is because for a given  $|V_{GS2} - V_{TH2}|$ , if the current decreases by a factor of 4, then  $(W/L)_2$  must decrease proportionally, and  $g_{m2} = \sqrt{2\mu_p C_{ox}(W/L)_2 I_{D2}}$  is lowered by the same factor.

We should also mention that in today's CMOS technology, channel-length modulation is quite significant and, more importantly, the behavior of transistors notably departs from the square law (Chapter 16). Thus, the gain of the stage in Fig. 3.9 must be expressed as

$$A_{v} = -g_{m1}\left(\frac{1}{g_{m2}} \|r_{O1}\|r_{O2}\right), \qquad (3.42)$$

where  $g_{m1}$  and  $g_{m2}$  must be obtained as described in Chapter 16.

## 3.2.3 CS Stage with Current-Source Load

In applications requiring a large voltage gain in a single stage, the relationship  $A_v = -g_m R_D$  suggests that we increase the load impedance of the CS stage. With a resistor or diodeconnected load, however, increasing the load resistance limits the output voltage swing.

A more practical approach is to replace the load with a current source. Described briefly in Example 3.2, the resulting circuit is shown in Fig. 3.14, where both transistors operate in saturation. Since the total impedance seen at the output node is equal to  $r_{01} || r_{02}$ , the gain is



Figure 3.14 CS stage with current-source load.

$$A_{\nu} = -g_{m1}(r_{O1} || r_{O2}). \tag{3.43}$$

The key point here is that the output impedance and the minimum required  $|V_{DS}|$  of  $M_2$  are less strongly coupled than the value and voltage drop of a resistor. The voltage

#### Common-Source Stage Sec. 3.2

 $|V_{DS2,min}| = |V_{GS2} - V_{TH2}|$  can be reduced to even a few hundred millivolts by simply increasing the width of  $M_2$ . If  $r_{02}$  is not sufficiently high, the length and width of  $M_2$  can be increased to achieve a smaller  $\lambda$  while maintaining the same overdrive voltage. The penalty is the large capacitance introduced by  $M_2$  at the output node.

We should remark that the output bias voltage of the circuit in Fig. 3.14 is not welldefined. Thus, the stage is reliably biased only if a feedback loop forces  $V_{out}$  to a known value (Chapter 8). The large-signal analysis of the circuit is left as an exercise for the reader.

As explained in Chapter 2, the output impedance of MOSFETs at a given drain current can be scaled by changing the channel length, i.e., to the first order,  $\lambda \propto 1/L$  and hence  $r_0 \propto L/I_D$ . Since the gain of the stage shown in Fig. 3.14 is proportional to  $r_{01} || r_{02}$ , we may surmise that longer transistors yield a higher voltage gain.

Let us consider  $M_1$  and  $M_2$  separately. If  $L_1$  is scaled by a factor  $\alpha (> 1)$ , then  $W_1$  may need to be scaled proportionally as well. This is because, for a given drain current,  $V_{GS1}$  –  $V_{TH1} \propto 1/\sqrt{(W/L)_1}$ , i.e., if  $W_1$  is not scaled, the overdrive voltage increases, limiting the output voltage swing. Also, since  $g_{m1} \propto \sqrt{(W/L)_1}$ , scaling up only  $L_1$  lowers  $g_{m1}$ .

In applications where these issues are unimportant,  $W_1$  can remain constant while  $L_1$ increases. Thus, the intrinsic gain of the transistor can be written as

$$g_{m1}r_{O1} = \sqrt{2\left(\frac{W}{L}\right)_{1}}\mu_{n}C_{ox}I_{D}\frac{1}{\lambda I_{D}},$$
(3.44)

indicating that the gain *increases* with L because  $\lambda$  depends more strongly on L than  $g_m$ does. Also, note that  $g_m r_0$  decreases as  $I_D$  increases.

Increasing  $L_2$  while keeping  $W_2$  constant increases  $r_{02}$  and hence the voltage gain, but at the cost of higher  $|V_{DS2}|$  required to maintain  $M_2$  in saturation.

# 3.2.4 CS Stage with Triode Load

Toi Ff11 A MOS device operating in deep triode region behaves as a resistor and can therefore serve as the load in a CS stage. Illustrated in Fig. 3.15, such a circuit biases the gate of  $M_2$  at a sufficiently low level, ensuring the load is in deep triode region for all output voltage swings.



Figure 3.15 CS stage with triode load.

Service and the bat

Since

$$R_{on2} = \frac{1}{\mu_p C_{ox} (W/L)_2 (V_{DD} - V_b - |V_{THP}|)},$$
(3.45)

the voltage gain can be readily calculated.

The principal drawback of this circuit stems from the dependence of  $R_{on2}$  upon  $\mu_P C_{ox}$ ,  $V_b$ , and  $V_{THP}$ . Since  $\mu_P C_{ox}$  and  $V_{THP}$  vary with process and temperature and since generating a precise value for  $V_b$  requires additional complexity, this circuit is difficult to use. Triode loads, however, consume less voltage headroom then do diode-connected devices because in Fig. 3.15  $V_{out,max} = V_{DD}$  whereas in Fig. 3.12,  $V_{out,max} \approx V_{DD} - |V_{THP}|$ .

# 3.2.5 CS Stage with Source Degeneration

In some applications, the square-law dependence of the drain current upon the overdrive voltage introduces excessive nonlinearity, making it desirable to "soften" the device characteristic. In Section 3.2.2, we noted the linear behavior of a CS stage using a diode-connected load. Alternatively, as depicted in Fig. 3.16, this can be accomplished by placing a "degeneration" resistor in series with the source terminal. Here, as  $V_{in}$  increases, so do  $I_D$  and the



Figure 3.16 CS stage with source degeneration.

voltage drop across  $R_S$ . That is, a fraction of  $V_{in}$  appears across the resistor rather than as the gate-source overdrive, thus leading to a smoother variation of  $I_D$ . From another perspective, we intend to make the gain equation a weaker function of  $g_m$ . Since  $V_{out} = -I_D R_D$ , the nonlinearity of the circuit arises from the nonlinear dependence of  $I_D$  upon  $V_{in}$ . We note that  $\partial V_{out}/\partial V_{in} = -(\partial I_D/\partial V_{in})R_D$ , and define the equivalent transconductance of the circuit as  $G_m = \partial I_D/\partial V_{in}$ . Now, assuming  $I_D = f(V_{GS})$ , we write

aVGS aVin

$$G_{m} = \frac{\partial I_{D}}{\partial V_{in}}$$
(3.46)  
$$- \frac{\partial f}{\partial V_{GS}}$$
(3.47)

Since 
$$V_{CS} = V_{in} - I_D R_S$$
, we have  $\partial V_{GS} / \partial V_{in} = 1 - R_S \partial I_D / \partial V_{in}$ , obtaining

$$G_m = \left(1 - R_S \frac{\partial I_D}{\partial V_{in}}\right) \frac{\partial f}{\partial V_{GS}}.$$
(3.48)

But,  $\partial f / \partial V_{GS}$  is the transconductance of  $M_1$ , and

$$G_m = \frac{g_m}{1 + g_m R_s}.$$
(3.49)

The small-signal voltage gain is thus equal to

$$A_{\nu} = -G_m R_D \tag{3.50}$$

$$=\frac{-g_m R_D}{1+g_m R_S}.$$
(3.51)

The same result can be derived using the small-signal model of Fig. 3.16(b). Equation (3.49) implies that as  $R_S$  increases,  $G_m$  becomes a weaker function of  $g_m$  and hence the drain current. In fact, for  $R_S \gg 1/g_m$ , we have  $G_m \approx 1/R_S$ , i.e.,  $\Delta I_D \approx \Delta V_{in}/R_S$ , indicating that most of the change in  $V_{in}$  appears across  $R_S$ . We say the drain current is a "linearized" function of the input voltage. The linearization is obtained at the cost of lower gain [and higher noise (Chapter 7)].



Figure 3.17 Small-signal equivalent circuit of a degenerated CS stage.

For our subsequent calculations, it is useful to determine  $G_m$  in the presence of body effect and channel-length modulation. With the aid of the equivalent circuit shown in Fig. 3.17, we recognize that the current through  $R_s$  equals  $I_{out}$  and, therefore,  $V_{in} = V_1 + I_{out}R_s$ . Summing the currents at node X, we have

$$I_{out} = g_m V_1 - g_{mb} V_X - \frac{I_{out} R_S}{r_O}$$
(3.52)

$$= g_m(V_{in} - I_{out}R_S) + g_{mb}(-I_{out}R_S) - \frac{I_{out}R_S}{r_0}.$$
 (3.53)

It follows that

$$G_m = \frac{l_{out}}{V_{in}} \tag{3.54}$$

$$(3.55)$$

$$=\frac{g_m r_O}{R_s+[1+(g_m+g_{mb})R_s]r_O}$$



Figure 3.25 Modeling output port of an amplifier by a Norton equivalent.

Defining  $G_m = I_{out}/V_{in}$ , we have  $V_{out} = -G_m V_{in} R_{out}$ . This lemma proves useful if  $G_m$ and  $R_{out}$  can be determined by inspection.

Example 3.6 -

Calculate the voltage gain of the circuit shown in Fig. 3.26. Assume  $I_0$  is ideal.



### Solution

The transconductance and output resistance of the stage are given by Eqs. (3.55) and (3.60), respectively. Thus,

$$A_{v} = -\frac{g_{m}r_{O}}{R_{s} + [1 + (g_{m} + g_{mb})R_{s}]r_{O}} \{ [1 + (g_{m} + g_{mb})r_{O}]R_{s} + r_{O} \}$$
(3.74)

$$= -g_m r_0. \tag{3.75}$$

Interestingly, the voltage gain is equal to the intrinsic gain of the transistor and independent of  $R_S$ . This is because, if  $I_0$  is ideal, the current through  $R_S$  cannot change and hence the small-signal voltage drop across  $R_S$  is zero—as if  $R_S$  were zero itself.

# rce Follower

Our analysis of the common-source stage indicates that, to achieve a high voltage gain with limited supply voltage, the load impedance must be as large as possible. If such a stage is to drive a low-impedance load, then a "buffer" must be placed after the amplifier so as to drive the load with negligible loss of the signal level. The source follower (also called the "common-drain" stage) can operate as a voltage buffer.

Illustrated in Fig. 3.27(a), the source follower senses the signal at the gate and drives

### Chap. 3 Single-Stage Amplifiers



Figure 3.27 (a) Source follower, and (b) its input-output characteristic.

the load at the source, allowing the source potential to "follow" the gate voltage. Beginning with the large-signal behavior, we note that for  $V_{in} < V_{TH}$ ,  $M_1$  is off and  $V_{out} = 0$ . As  $V_{in}$  exceeds  $V_{TH}$ ,  $M_1$  turns on in saturation (for typical values of  $V_{DD}$ ) and  $I_{D1}$  flows through  $R_S$  [Fig. 3.27(b)]. As  $V_{in}$  increases further,  $V_{out}$  follows the input with a difference (level shift) equal to  $V_{GS}$ . We can express the input-output characteristic as:

$$\frac{1}{2}\mu_n C_{ox} \frac{W}{L} \frac{V_{TH} - V_{out}}{V_{TH} - V_{out}} R_s = V_{out}.$$
(3.76)

Let us calculate the small-signal gain of the circuit by differentiating both sides of (3.76) with respect to  $V_{in}$ :

$$\frac{1}{2}\mu_{n}C_{ox}\frac{W}{L}^{2}(V_{in}-V_{TH}-V_{out})\left(1-\frac{\partial V_{TH}}{\partial V_{in}}-\frac{\partial V_{out}}{\partial V_{in}}\right)R_{s}=\frac{\partial V_{out}}{\partial V_{in}}.$$

$$(3.77)$$

$$V_{TH}/\partial V_{in}=\eta\partial V_{out}/\partial V_{in},$$

$$\frac{\partial V_{out}}{\partial V_{in}} = \frac{\mu_n C_{ox} \frac{W}{L} (V_{in} + V_{TH} - V_{out}) R_S}{1 + \mu_n C_{ox} \frac{W}{L} (V_{in} - V_{TH} - V_{out}) R_S (1 + \eta)}$$
(3.78)

Also, note that

Since ∂

$$g_m = \mu_n C_{ox} \frac{W}{L} (V_{in} - V_{TH} - V_{out}).$$
(3.79)

Consequently,

$$A_{\nu} = \frac{g_m R_s}{1 + (g_m + g_{mb})R_s}.$$
 (3.80)

The same result is more easily obtained with the aid of a small-signal equivalent circuit. From Fig. 3.28, we have  $V_{in} - V_1 = V_{out}$ ,  $V_{bs} = -V_{out}$ , and  $g_m V_1 - g_{mb} V_{out} = V_{out}/R_s$ .

68



Figure 3.28 Small-signal equivalent circuit of source follower.



Figure 3.29 Voltage gain of source follower versus input voltage.

Thus,  $V_{out}/V_{in} = g_m R_S / [1 + (g_m + g_{mb})R_S].$ 

Sketched in Fig. 3.29 vs.  $V_{in}$ , the voltage gain begins from zero for  $V_{in} \approx V_{TH}$  (that is,  $g_m \approx 0$ ) and monotonically increases. As the drain current and  $g_m$  increase,  $A_v$  approaches  $g_m/(g_m + g_{mb}) = 1/(1 + \eta)$ . Since  $\eta$  itself slowly decreases with  $V_{out}$ ,  $A_v$  would eventually become equal to unity, but for typical allowable source-bulk voltages,  $\eta$  remains greater than roughly 0.2.

An important result of (3.80) is that even if  $R_s = \infty$ , the voltage gain of a source follower is not equal to one. We return to this point later. Note that  $M_1$  in Fig. 3.27 does not enter the triode region if  $V_{in}$  remains below  $V_{DD}$ .

In the source follower of Fig. 3.27, the drain current of  $M_1$  heavily depends on the input dc level. For example, if Vin changes from 1.5 V to 2 V, ID may increase by a factor of 2 and hence  $V_{GS} - V_{TH}$  by  $\sqrt{2}$ , thereby introducing substantial nonlinearity in the input-output characteristic. To alleviate this issue, the resistor can be replaced by a current source as shown in Fig. 3.30(a). The current source itself is implemented as an NMOS transistor operating in the saturation region [Fig. 3.30(b)].



Figure 3.30 Source follower using an NMOS transistor as current source.

### Example 3.7.

Suppose in the source follower of Fig. 3.30(a),  $(W/L)_1 = 20/0.5$ ,  $I_1 = 200 \ \mu A$ ,  $V_{TH0} = 0.6 \ V$ ,  $2\Phi_F = 0.7 \text{ V}$ ,  $\mu_n C_{ox} = 50 \ \mu \text{A/V}^2$ , and  $\gamma = 0.4 \ \text{V}^2$ . (a) Calculate  $V_{out}$  for  $V_{in} = 1.2$  V.

(b) If  $I_1$  is implemented as  $M_2$  in Fig. 3.30(b), find the minimum value of  $(W/L)_2$  for which  $M_2$ remains saturated.

### Solution

12

A .....

(a) Since the threshold voltage of  $M_1$  depends on  $V_{out}$ , we perform a simple iteration. Noting

$$(V_{in} - V_{TH} - V_{out})^2 = \frac{2I_D}{\mu_n C_{ox} \left(\frac{W}{L}\right)_1}.$$
(3.81)

we first assume  $V_{TH} \approx 0.6$  V, obtaining  $V_{out} = 0.153$  V. Now we calculate a new  $V_{TH}$  as

$$V_{TH} = V_{TH0} + \gamma (\sqrt{2\Phi_F + V_{SB}} - \sqrt{2\Phi_F})$$
(3.82)

$$= 0.635 \text{ V}.$$

(3.83)

This indicates that  $V_{out}$  is approximately 35 mV less than that calculated above, i.e.,  $V_{out} \approx 0.119$  V. (b) Since the drain-source voltage of  $M_2$  is equal to 0.119 V, the device is saturated only if  $(V_{GS} - V_{TH})_2 \le 0.119$  V. With  $I_D = 200 \ \mu$ A, this gives  $(W/L)_2 \ge 283/0.5$ . Note the substantial drain junction and overlap capacitance contributed by  $M_2$  to the output node.

To gain a better understanding of source followers, let us calculate the small-signal output resistance of the circuit in Fig. 3.31(a). Using the equivalent circuit of Fig. 3.31(b) and noting that  $V_1 = -V_X$ , we write

$$I_X - g_m V_X - g_{mb} V_X = 0. (3.84)$$





As explained in Chapter 7, source followers also introduce substantial noise. For this reason, the circuit of Fig. 3.39(b) is ill-suited to low-noise applications.

## 3.4 Common-Gate Stage

In common-source amplifiers and source followers, the input signal is applied to the gate of a MOSFET. It is also possible to apply the signal to the source terminal. Shown in Fig. 3.40(a), a common-gate (CG) stage senses the input at the source and produces the output at the drain. The gate is connected to a dc voltage to establish proper operating conditions. Note that the bias current of  $M_1$  flows through the input signal source. Alternatively, as depicted in Fig. 3.40(b),  $M_1$  can be biased by a constant current source, with the signal capacitively coupled to the circuit.



Figure 3.40 (a) Common-gate stage with direct coupling at input, (b) CG stage with capacitive coupling at input.

We first study the large-signal behavior of the circuit in Fig. 3.40(a). For simplicity, let us assume that  $V_{in}$  decreases from a large positive value. For  $V_{in} \ge V_b - V_{TH}$ ,  $M_1$  is off and  $V_{out} = V_{DD}$ . For lower values of  $V_{in}$ , we can write

$$I_D = \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_b - V_{in} - V_{TH})^2, \qquad (3.95)$$

if  $M_1$  is in saturation. As  $V_{in}$  decreases, so does  $V_{out}$ , eventually driving  $M_1$  into the triode region if  $V_{DR} = V_D + V_T H$ 

$$V_{DD} - \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_b - V_{in} - V_{TH})^2 R_D = V_b - V_{TH}.$$
 (3.96)

The input-output characteristic is shown in Fig. 3.41. If  $M_1$  is saturated, we can express the output voltage as

$$\begin{array}{c} V_{out} \neq V_{DD} - \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_b - V_{in} - V_{TH})^2 R_D, \\ \frac{1}{2} V_{DD} - \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_b - V_{in} - V_{TH})^2 R_D, \\ \frac{1}{2} V_{OUt} \neq 0 - \frac{1}{2} \mu_n C_{OM} \frac{W}{L} \left( \frac{2}{V_b} - \frac{V_{in} - V_{TH}}{L} \right) \left( \frac{1}{2} - \frac{1}{2} \frac{1}{V_{in}} \frac{V_b}{L} - \frac{1}{2} \frac{1}{V_{in}} \frac{V_b}{L} \right) \left( \frac{1}{2} \frac{1}{V_b} \frac{1}{V_b} \frac{1}{V_b} \right) \left( \frac{1}{2} \frac{1}{V_b} \frac{1}$$

Bru In

Vout VDD  $V_{\rm b} - V_{\rm TH}$ 

input-Figure 3.41 Common-gate output characteristic.

+ (1+ 20

obtaining a small-signal gain of

$$\frac{\partial V_{out}}{\partial V_{in}} = \int \mu_n C_{ox} \frac{W}{L} (V_b - V_{in} - V_{T,H}) \left( -1 - \frac{\partial V_{T,H}}{\partial V_{in}} \right) R_D.$$
(3.98)

Since  $\partial V_{TH} / \partial V_{in} = \partial V_{TH} / \partial V_{SB} = \eta$ , we have

$$= \frac{\partial V_{TH}}{\partial V_{SB}} = \eta, \text{ we have}$$

$$\frac{\partial V_{out}}{\partial V_{in}} = \mu_n C_{ox} \frac{W}{L} R_D (V_b - V_{in} - V_{TH}) (1 \mp \eta) \qquad (\sqrt{3.99})$$

$$A = g_m (1 + \eta) R_D. \qquad M = Mn \quad (3.100)$$

Note that the gain is positive. Interestingly, body effect increases the equivalent transconductance of the stage.

The input impedance of the circuit is also important. We note that, for  $\lambda = 0$ , the impedance seen at the source of  $M_1$  in Fig. 3.40(a) is the same as that at the source of  $M_1$  in Fig. 3.31, namely,  $1/(g_m + g_{mb}) = 1/[g_m(1 + \eta)]$ . Thus, the body effect decreases the input impedance of the common-gate stage. The relatively low input impedance of the common-gate stage proves useful in some applications.

### Example 3.10.

In Fig. 3.42, transistor  $M_1$  senses  $\Delta V$  and delivers a proportional current to a 50- $\Omega$  transmission line. The other end of the line is terminated by a 50- $\Omega$  resistor in Fig. 3.42(a) and a common-gate stage in Fig. 3.42(b). Assume  $\lambda = \gamma = 0$ .

(a) Calculate  $V_{out}/V_{in}$  at low frequencies for both arrangements.

(b) What condition is necessary to minimize wave reflection at node X?

### Solution

(a) For small signals applied to the gate of  $M_1$ , the drain current experiences a change equal to  $g_{m1} \Delta V_X$ . This current is drawn from  $R_D$  in Fig. 3.42(a) and  $M_2$  in Fig. 3.42(b), producing an output voltage swing equal to  $-g_{m1}\Delta V_X R_D$ . Thus,  $A_v = -g_m R_D$  for both cases.

(b) To minimize reflection at node X, the resistance seen at the source of  $M_2$  must equal 50  $\Omega$ and the reactance must be small. Thus,  $1/(g_m + g_{mb}) = 50 \Omega$ , which can be ensured by proper sizing and biasing of  $M_2$ . To minimize the capacitances of the transistor, it is desirable to use a small device biased at a large current. (Recall that  $g_m = \sqrt{2\mu_n C_{ox}(W/L)I_D}$ .) In addition to higher power dissipation, this remedy also requires a large  $V_{GS}$  for  $M_2$ .



Figure 2.20 Conceptual visualization of saturation and triode regions.

if  $V_D - V_G$  of a PFET is not large enough (<  $|V_{THP}|$ ), the device is saturated. Note that this view does not require knowledge of the source voltage. This means we must know a priori which terminal operates as the drain.

### econd-Order Effects

Our analysis of the MOS structure has thus far entailed various simplifying assumptions, some of which are not valid in many analog circuits. In this section, we describe three second-order effects that are essential in our subsequent circuit analyses. Other phenomena that appear in submicron devices are studied in Chapter 16.

**Body Effect** In the analysis of Fig. 2.10, we tacitly assumed that the bulk and the source of the transistor were tied to ground. What happens if the bulk voltage of an NFET drops below the source voltage (Fig. 2.21)? Since the S and D junctions remain reverse-biased, we surmise that the device continues to operate properly but certain characteristics may



Figure 2.21 NMOS device with negative bulk voltage.

change. To understand the effect, suppose  $V_S = V_D = 0$ , and  $V_G$  is somewhat less than  $V_{TH}$  so that a depletion region is formed under the gate but no inversion layer exists. As  $V_B$  becomes more negative, more holes are attracted to the substrate connection, leaving a larger negative charge behind, i.e., as depicted in Fig. 2.22, the depletion region becomes wider. Now recall from Eq. (2.1) that the threshold voltage is a function of the total charge in the depletion region because the gate charge must mirror  $Q_d$  before an inversion layer is



Figure 2.22 Variation of depletion region charge with bulk voltage.

formed. Thus, as  $V_B$  drops and  $Q_d$  increases,  $V_{TH}$  also increases. This is called the "body effect" or the "backgate effect."

It can be proved that with body effect:

$$V_{TH} = V_{TH0} + \left( \sqrt{|2\Phi_F + V_{SB}|} - \sqrt{|2\Phi_F|} \right), \qquad (2.22)$$

where  $V_{TH0}$  is given by (2.1),  $\gamma = \sqrt{2q\epsilon_{si}N_{sub}}/C_{ox}$  denotes the body effect coefficient, and  $V_{SB}$  is the source-bulk potential difference [1]. The value of  $\gamma$  typically lies in the range of 0.3 to 0.4 V<sup>1/2</sup>.

#### Example 2.3 -

In Fig. 2.23(a), plot the drain current if  $V_X$  varies from  $-\infty$  to 0. Assume  $V_{TH0} = 0.6$  V,  $\gamma = 0.4$  V<sup>1/2</sup>, and  $2\Phi_F = 0.7$  V.





#### Solution

11 42612

If  $V_X$  is sufficiently negative, the threshold voltage of  $M_1$  exceeds 1.2 V and the device is off. That is,

$$1.2 V = 0.6 + 0.4 \left( \sqrt{0.7 - V_{X1}} - \sqrt{0.7} \right), \qquad (2.23)$$

#### 6.1 Introduction

The design considerations for a simple inverter circuit were presented in the previous chapter. We now extend this discussion to address the synthesis of arbitrary digital gates, such as NOR, NAND, and XOR. The focus is on *combinational logic* or *nonregenerative* circuits—that is, circuits having the property that at any point in time, the output of the circuit is related to its current input signals by some Boolean expression (assuming that the transients through the logic gates have settled). No intentional connection from outputs back to inputs is present.

This is in contrast to another class of circuits, known as *sequential* or *regenerative*, for which the output is not only a function of the current input data, but also of previous values of the input signals (see Figure 6-1). This can be accomplished by connecting one or more outputs intentionally back to some inputs. Consequently, the circuit "remembers" past events and has a sense of *history*. A sequential circuit includes a combinational logic portion and a module that holds the state. Example circuits are registers, counters, oscillators, and memory. Sequential circuits are the topic of the next chapter.

There are numerous circuit styles to implement a given logic function. As with the inverter, the common design metrics by which a gate is evaluated are area, speed, energy, and power. Depending on the application, the emphasis will be on different metrics. For example, the switching speed of digital circuits is the primary metric in a high-performance processor, while in a battery operated circuit, it is energy dissipation. Recently, power dissipation also has become an important concern and considerable emphasis is placed on understanding the sources of power and approaches to dealing with power. In addition to these metrics, robustness to noise and reliability are also very important considerations. We will see that certain logic styles can significantly improve performance, but they usually are more sensitive to noise.

#### 6.2 Static CMOS Design

The most widely used logic style is static complementary CMOS. The static CMOS style is really an extension of the static CMOS inverter to multiple inputs. To review, the primary advantage of the CMOS structure is robustness (i.e., low sensitivity to noise), good performance, and low power consumption with no static power dissipation. Most of those properties are carried over to large fan-in logic gates implemented using a similar circuit topology.



#### 236

Scanned with CamScanner

#### 6.2 Static CMOS Design

The complementary CMOS circuit style falls under a broad class of logic circuits called static circuits in which at every point in time, each gate output is connected to either  $V_{DD}$  or  $V_{SS}$  via a low-resistance path. Also, the outputs of the gates assume at all times the value of the Boolean function implemented by the circuit (ignoring, the transient effects during switching periods). This is in contrast to the dynamic circuit class, which relies on temporary storage of signal values on the capacitance of high-impedance circuit nodes. The latter approach has the advantage that the resulting gate is simpler and faster. Its design and operation are, however, more involved and prone to failure because of increased sensitivity to noise.

In this section, we sequentially address the design of various static circuit flavors, including complementary CMOS, ratioed logic (pseudo-NMOS and DCVSL), and pass-transistor logic. We also deal with issues of scaling to lower power supply voltages and threshold voltages.

#### 6.2.1 Complementary CMOS

#### Concept

A static CMOS gate is a combination of two networks—the *pull-up network* (PUN) and the *pull-down* network (PDN), as shown in Figure 6-2. The figure shows a generic N-input logic gate where all inputs are distributed to both the pull-up and pull-down networks. The function of the PUN is to provide a connection between the output and  $V_{DD}$  anytime the output of the logic gate is meant to be 1 (based on the inputs). Similarly, the function of the PDN is to connect the output to  $V_{SS}$  when the output of the logic gate is meant to be 0. The PUN and PDN networks are constructed in a mutually exclusive fashion such that *one and only one* of the networks is conducting in steady state. In this way, once the transients have settled, a path always exists between  $V_{DD}$  and the output F for a high output ("one"), or between  $V_{SS}$  and F for a low output ("zero"). This is equivalent to stating that the output node is always a *low-impedance* node in steady state.



Figure 6-2 Complementary logic gate as a combination of a PUN (pull-up network) and a PDN (pull-down network).

#### 237

Scanned with CamScanner

In constructing the PDN and PUN networks, the designer should keep the following observations in mind:

- A transistor can be thought of as a switch controlled by its gate signal. An NMOS switch is on when the controlling signal is high and is off when the controlling signal is low. A PMOS transistor acts as an inverse switch that is on when the controlling signal is low and off when the controlling signal is high.
- The PDN is constructed using NMOS devices, while PMOS transistors are used in the PUN. The primary reason for this choice is that NMOS transistors produce "strong zeros," and PMOS devices generate "strong ones." To illustrate this, consider the examples shown in Figure 6-3. In Figure 6-3a, the output capacitance is initially charged to  $V_{DD}$ . Two possible discharge scenarios are shown. An NMOS device pulls the output all the way down to GND, while a PMOS lowers the output no further than  $|V_{T_p}|$ —the PMOS turns off at that point and stops contributing discharge current. NMOS transistors are thus the preferred devices in the PDN. Similarly, two alternative approaches to charging up a capacitor are shown in Figure 6-3b, with the output initially at GND. A PMOS switch succeeds in charging the output all the way to  $V_{DD}$ , while the NMOS device fails to raise the output above  $V_{DD} - V_{Tn}$ . This explains why PMOS transistors are preferentially used in a PUN. • A set of rules can be derived to construct logic functions (see Figure 6-4). NMOS devices connected in series correspond to an AND function. With all the inputs high, the series combination conducts and the value at one end of the chain is transferred to the other end. Similarly, NMOS transistors connected in parallel represent an OR function. A conducting path exists between the output and input terminal if at least one of the inputs is high. Using similar arguments, construction rules for PMOS networks can be formulated. A series con-

(a) Pulling down a node by using NMOS and PMOS switches



(b) Pulling down a node by using NMOS and PMOS switches

Figure 6-3 Simple examples illustrate why an NMOS should be used as a pull-down, and a PMOS should be used as a pull-up device.



Figure 6-4 NMOS logic rules—series devices implement an AND, and parallel devices implement an OR.

nection of PMOS conducts if both inputs are low, representing a NOR function  $(\overline{A} \cdot \overline{B} = \overline{A} + \overline{B})$ , while PMOS transistors in parallel implement a NAND  $(\overline{A} + \overline{B} = \overline{A} \cdot \overline{B})$ .

- Using De Morgan's theorems  $(\overline{A + B} = \overline{A} \cdot \overline{B} \text{ and } \overline{A \cdot B} = \overline{A} + \overline{B})$ , it can be shown that the pull-up and pull-down networks of a complementary CMOS structure are *dual* networks. This means that a parallel connection of transistors in the pull-up network corresponds to a series connection of the corresponding devices in the pull-down network, and vice versa. Therefore, to construct a CMOS gate, one of the networks (e.g., PDN) is implemented using combinations of series and parallel devices. The other network (i.e., PUN) is obtained using the duality principle by walking the hierarchy, replacing series subnets with parallel subnets, and parallel subnets with series subnets. The complete CMOS gate is constructed by combining the PDN with the PUN.
  - The complementary gate is naturally *inverting*, implementing only functions such as NAND, NOR, and XNOR. The realization of a noninverting Boolean function (such as AND OR, or XOR) in a single stage is not possible, and requires the addition of an extra inverter stage.
  - The number of transistors required to implement an N-input logic gate is 2N.

# Example 6.1 Two-Input NAND Gate

Figure 6-5 shows a two-input NAND gate  $(F = \overline{A \cdot B})$ . The PDN network consists of two NMOS devices in series that conduct when both A and B are high. The PUN is the dual



Figure 6-5 Two-input NAND gate in complementary static CMOS style.

Scanned with CamScanner

# Chapter 6 • Designing Combinational Logic Gates in CMOS

network, and it consists of two parallel PMOS transistors. This means that F is 1 if A = 0or B = 0, which is equivalent to  $F = \overline{A \cdot B}$ . The truth table for the simple two input NAND gate is given in Table 6-1. It can be verified that the output F is always connected to either  $V_{DD}$  or GND, but never to both at the same time.

|                                          | A        | В                                | F                                                       |
|------------------------------------------|----------|----------------------------------|---------------------------------------------------------|
| 14 in                                    | 0        | 0                                | $(1, \dots, 1^{n+1}) \in \mathbb{R}^{n}$                |
|                                          | 0        | $_{\mu e \rho} [1]_{[1,\infty)}$ | $\mathbf{r} = \mathbf{l}_{m} \mathbf{r} \mathbf{l}_{m}$ |
| an a | 1        | . 0                              | $1_{1,2,2}^{(i)}$                                       |
| (j. 17)                                  | 1.994 (2 | 1 ( p. )                         | <b>0</b>                                                |

 Table 6-1
 Truth Table for two-Input NAND.

#### Example 6.2 Synthesis of Complex CMOS Gate

Using complementary CMOS logic, consider the synthesis of a complex CMOS gate whose function is  $F = \overline{D + A \cdot (B + C)}$ . The first step in the synthesis of the logic gate is to derive the pull-down network as shown in Figure 6-6a by using the fact that NMOS devices in series implements the AND function and parallel device implements the OR function. The next step is to use duality to derive the PUN in a hierarchical fashion. The PDN network is broken into smaller networks (i.e., subset of the PDN) called subnets that simplify the derivation of the PUN. In Figure 6-6b, the subnets (SN) for the pull-down net-

SN4

SN2

SN3





Complex complementary CMOS gate. Figure 6-6

# 6.2 Static CMOS Design



(a) Parallel data transmission

(b) Serial data transmission

Parallel versus time-multiplexed data busses. Figure 6-25

the bus toggles between 0 and 1. Care must be taken in digital systems to avoid time-multiplexing data streams with very distinct data characteristics.

4. Glitch Reduction by balancing signal paths The occurrence of glitching in a circuit is mainly due to a mismatch in the path lengths in the network. If all input signals of a gate change simultaneously, no glitching occurs. On the other hand, if input signals change at different times, a dynamic hazard might develop. Such a mismatch in signal timing is typically the result of different path lengths with respect to the primary inputs of the network. This is illustrated in Figure 6-26. Assume that the XOR gate has a unit delay. The first network (a) suffers from glitching as a result of the wide disparity between the arrival times of the input signals for a gate. For example, for gate  $F_3$ , one input settles at time 0, while the second one only arrives at time 2. Redesigning the network so that all arrival times are identical can dramatically reduce the number of superfluous transitions (network b).



(a) Network sensitive to glitching

(b) Glitch-free network

Figure 6-26 Glitching is influenced by matching of signal path lengths. The annotated numbers indicate the signal arrival times.

#### Summary

The CMOS logic style described in the previous section is highly robust and scalable with technology, but requires 2N transistors to implement an N-input logic gate. Also, the load capacitance is significant, since each gate drives two devices (a PMOS and an NMOS) per fan-out. This has opened the door for alternative logic families that either are simpler or faster.

#### **Ratioed Logic** 6.2.2

#### Concept

Ratioed logic is an attempt to reduce the number of transistors required to implement a given logic function, often at the cost of reduced robustness and extra power dissipation. The purpose





Figure 6-27 Ratioed logic gate.

of the PUN in complementary CMOS is to provide a conditional path between  $V_{DD}$  and the output when the PDN is turned off. In ratioed logic, the entire PUN is replaced with a single unconditional load device that pulls up the output for a high output as in Figure 6-27a. Instead of a combination of active pull-down and pull-up networks, such a gate consists of an NMOS pull-down network that realizes the *logic function*, and a simple *load device*. Figure 6-27b shows an example of ratioed logic, which uses a grounded PMOS load and is referred to as a pseudo-NMOS gate.

The clear advantage of a pseudo-NMOS gate is the reduced number of transistors (N + 1, versus 2N for complementary CMOS). The nominal high output voltage  $(V_{OH})$  for this gate is  $V_{DD}$  since the pull-down devices are turned off when the output is pulled high (assuming that  $V_{OL}$  is below  $V_{Tn}$ ). On the other hand, the **nominal low output voltage is not 0 V**, since there is contention between the devices in the PDN and the grounded PMOS load device. This results in reduced noise margins and, more importantly, static power dissipation. The sizing of the load device relative to the pull-down devices can be used to trade off parameters such as noise margin, propagation delay, and power dissipation. Since the voltage swing on the output and the overall functionality of the gate depend on the ratio of the NMOS and PMOS sizes, the circuit is called ratioed. This is in contrast to the ratioless logic styles, such as complementary CMOS, where the low and high levels do not depend on transistor sizes.

Computing the dc-transfer characteristic of the pseudo-NMOS proceeds along paths similar to those used for its complementary CMOS counterpart. The value of  $V_{OL}$  is obtained by equating the currents through the driver and load devices for  $V_{in} = V_{DD}$ . At this operation point, it is reasonable to assume that the NMOS device resides in linear mode (since, ideally, the output should be close to 0V), while the PMOS load is saturated:

$$k_n \left( (V_{DD} - V_{Tn}) V_{OL} - \frac{V_{OL}^2}{2} \right) + k_p \left( (-V_{DD} - V_{Tp}) \cdot V_{DSATp} - \frac{V_{DSATp}^2}{2} \right) = 0$$
(6.27)

Card a state and and a start of

Assuming that  $V_{OL}$  is small relative to the gate drive  $(V_{DD} - V_T)$ , and that  $V_{Tn}$  is equal to  $V_{Tp}$  in magnitude,  $V_{OL}$  can be approximated as

6.2 Static CMOS Design

$$V_{OL} \approx \frac{k_p (V_{DD} + V_{Tp}) \cdot V_{DSATp}}{k_n (V_{DD} - V_{Tn})} \approx \frac{\mu_p \cdot W_p}{\mu_n \cdot W_n} \cdot V_{DSATp}$$
(6.28)

265

In order to make  $V_{OL}$  as small as possible, the PMOS device should be sized much smaller than the NMOS pull-down devices. Unfortunately, this has a negative impact on the *propagation* delay for charging up the output node since the current provided by the PMOS device is limited.

A major disadvantage of the pseudo-NMOS gate is the static power that is dissipated when the output is low through the direct current path that exists between  $V_{DD}$  and GND. The static power consumption in the low-output mode is easily derived:

$$P_{low} = V_{DD} I_{low} \approx V_{DD} \cdot \left| k_p \left( (-V_{DD} - V_{Tp}) \cdot V_{DSATp} - \frac{V_{DSATp}^2}{2} \right) \right|$$
(6.29)

#### Example 6.7 Pseudo-NMOS Inverter

Consider a simple pseudo-NMOS inverter (where the PDN network in Figure 6-27 degenerates to a single transistor) with an NMOS size of  $0.5 \,\mu\text{m}/0.25 \,\mu\text{m}$ . In this example, we study the effect of sizing the PMOS device to demonstrate the impact on various parameters. The W-L ratio of the grounded PMOS is varied over values from 4, 2, 1, 0.5 to 0.25. Devices with a W-L < 1 are constructed by making the length greater than the width. The voltage transfer curve for the different sizes is plotted in Figure 6-28.

Table 6-9 summarizes the nominal output voltage  $(V_{OL})$ , static power dissipation, and the low-to-high propagation delay. The low-to-high delay is measured as the time it takes to reach 1.25 V from  $V_{OL}$  (which is not 0V for this inverter)—by definition. The trade-off between the static and dynamic properties is apparent. A larger pull-up device not only improves performance, but also increases static power dissipation and lowers noise margins by increasing  $V_{OL}$ .





# Chapter 6 • Designing Combinational Logic Gates in CMOS

ting L'ent Before

| Size              | V <sub>OL</sub> | Static Power<br>Dissipation | t <sub>plh</sub> |  |
|-------------------|-----------------|-----------------------------|------------------|--|
| 4                 | 0.693 V         | 564 μW                      | 14 ps            |  |
| 2                 | 0.273 V         | 298 μW                      | 56 ps            |  |
| 1                 | 0.133 V         | 160 µW                      | 123 ps           |  |
| 0.5               | 0.064 V         | 80 µW                       | 268 ps           |  |
| 0.25              | 0.031 V         | 41 μW                       | 569 ps           |  |
| 125 - 1 - 15<br>1 |                 |                             |                  |  |

 Table 6-9
 Performance of a pseudo-NMOS inverter.

266

ante de la política Ante de la contra de

Notice that the simple first-order model to predict  $V_{OL}$  is quite effective. For a PMOS W-L of 4,  $V_{OL}$  is given by (30/115) (4) (0.63V) = 0.66V.

The static power dissipation of pseudo-NMOS limits its use. When area is most important however, its reduced transistor count compared with complementary CMOS is quite attractive. Pseudo-NMOS thus still finds occasional use in large fan-in circuits. Figure 6-29 shows the schematics of pseudo-NMOS NOR and NAND gates.



6.2 Static CMOS Design



Figure 6-32 Advantage of over single-ended (a) differential (b) gate.

#### 6.2.3 Pass-Transistor Logic

#### Pass-Transistor Basics

A popular and widely used alternative to complementary CMOS is *pass-transistor logic*, which attempts to reduce the number of transistors required to implement logic by allowing the primary inputs to drive gate terminals as well as source-drain terminals [Radhakrishnan85]. This is in contrast to logic families that we have studied so far, which only allow primary inputs to drive the gate terminals of MOSFETS.

Figure 6-33 shows an implementation of the AND function constructed that way, using only NMOS transistors. In this gate, if the *B* input is high, the top transistor is turned on and copies the input *A* to the output *F*. When *B* is low, the bottom pass-transistor is turned on and passes a 0. The switch driven by  $\overline{B}$  seems to be redundant at first glance. Its presence is essential to ensure that the gate is static—a low-impedance path must exist to the supply rails under all circumstances (in this particular case, when *B* is low).

The promise of this approach is that fewer transistors are required to implement a given function. For example, the implementation of the AND gate in Figure 6-33 requires 4 transistors (including the inverter required to invert B), while a complementary CMOS implementation would require 6 transistors. The reduced number of devices has the additional advantage of lower capacitance.



Figure 6-33

Add USUS DOA

Pass-transistor implementation of an AND gate.

# Chapter 6 • Designing Combinational Logic Gates in CMOS

Obviously, the number of switches per segment grows with increasing values of  $t_{buf}$ . In current technologies,  $m_{opt}$  typically equals 3 or 4. The presented analysis ignores that  $tp_{buf}$  itself is a function of the load m. A more accurate analysis taking this factor into account is presented in Chapter 9.

### Example 6.14 Transmission-Gate Chain

Consider the same 16-transmission-gate chain. The buffers shown in Figure 6-51 can be implemented as inverters (instead of two cascaded inverters). In some cases, it might be necessary to add an extra inverter to produce the correct polarity. Assuming that each inverter is sized such that the NMOS is  $0.5 \,\mu$ m/ $0.25 \,\mu$ m and PMOS is  $0.5 \,\mu$ m / $0.25 \,\mu$ m, Eq. (6.39) predicts that an inverter must be inserted every 3 transmission gates. The simulated delay when placing an inverter every two transmission gates is 154 ps; for every three transmission gates, the delay is 154 ps; and for four transmission gates, it is 164 ps. The insertion of buffering inverters reduces the delay by a factor of almost 2.

**CAUTION:** Although many of the circuit styles discussed in the previous sections sound very interesting, and might be superior to static CMOS in many respects, none has the *robustness and ease of design* of complementary CMOS. Therefore, use them sparingly and with caution. For designs that have no extreme area, complexity, or speed constraints, complementary CMOS is the recommended design style.

#### 6.3 Dynamic CMOS Design

It was noted earlier that static CMOS logic with a fan-in of N requires 2N devices. A variety of approaches were presented to reduce the number of transistors required to implement a given logic function including pseudo-NMOS, pass-transistor logic, etc. The pseudo-NMOS logic style requires only N + 1 transistors to implement an N input logic gate, but unfortunately it has static power dissipation. In this section, an alternate logic style called *dynamic logic* is presented that obtains a similar result, while avoiding static power consumption. With the addition of a clock input, it uses a sequence of *precharge* and conditional *evaluation* phases.

# 6.3.1 Dynamic Logic: Basic Principles

The basic construction of an (*n*-type) dynamic logic gate is shown in Figure 6-52a. The PDN (pull-down network) is constructed exactly as in complementary CMOS. The operation of this circuit is divided into two major phases—*precharge* and *evaluation*—with the mode of operation determined by the *clock signal CLK*.

### Precharge

When CLK = 0, the output node *Out* is precharged to  $V_{DD}$  by the PMOS transistor  $M_p$ . During that time, the evaluate NMOS transistor  $M_e$  is off, so that the pull-down path is disabled. The

6.3 Dynamic CMOS Design



evaluation FET eliminates any static power that would be consumed during the precharge period (i.e., static current would flow between the supplies if both the pull-down and the precharge device were turned on simultaneously).

#### Evaluation

For CLK = 1, the precharge transistor  $M_p$  is off, and the evaluation transistor  $M_e$  is turned on. The output is conditionally discharged based on the input values and the pull-down topology. If the inputs are such that the PDN conducts, then a low resistance path exists between *Out* and GND, and the output is discharged to GND. If the PDN is turned off, the precharged value remains stored on the output capacitance  $C_L$ , which is a combination of junction capacitances, the wiring capacitance, and the input capacitance of the fan-out gates. During the evaluation phase, the only possible path between the output node and a supply rail is to GND. Consequently, once *Out* is discharged, it cannot be charged again until the next precharge operation. The inputs to the gate can thus make at most one transition during evaluation. Notice that the output can be in the *high-impedance state* during the evaluation period if the pull-down network is turned off. This behavior is fundamentally different from the static counterpart that always has a low resistance path between the output and one of the power rails.

As an example, consider the circuit shown in Figure 6-52b. During the precharge phase (CLK = 0), the output is precharged to  $V_{DD}$  regardless of the input values, because the evaluation device is turned off. During evaluation (CLK = 1), a conducting path is created between *Out* and GND if (and only if)  $A \cdot B + C$  is TRUE. Otherwise, the output remains at the precharged state of  $V_{DD}$ . The following function is thus realized:

$$Out = CLK + (A \cdot B + C) \cdot CLK$$

(6.40)

## 6.3 Dynamic CMOS Design

# Speed and Power Dissipation of Dynamic Logic 6.3.2

and new the provide the

The main advantages of dynamic logic are increased speed and reduced implementation area. Fewer devices to implement a given logic function implies that the overall load capacitance is much smaller. The analysis of the switching behavior of the gate has some interesting peculiarities to it. After the precharge phase, the output is high. For a low input signal, no additional switching occurs. As a result,  $t_{pLH} = 0!$  The high-to-low transition, on the other hand, requires the discharging of the output capacitance through the pull-down network. Therefore,  $t_{pHL}$  is proportional to  $C_L$  and the current-sinking capabilities of the pull-down network. The presence of the evaluation transistor slows the gate somewhat, as it presents an extra series resistance. Omitting this transistor, while functionally not forbidden, may result in static power dissipation and potentially a performance loss.

The preceding analysis is somewhat unfair because it ignores the influence of the precharge time on the switching speed of the gate. The precharge time is determined by the time it takes to charge  $C_L$  through the PMOS precharge transistor. During this time, the logic in the gate cannot be utilized. Very often, however, the overall digital system can be designed in such a way that the precharge time coincides with other system functions. For instance, the precharge of the arithmetic unit in a microprocessor could coincide with the instruction decode. The designer has to be aware of this "dead zone" in the use of dynamic logic and thus should carefully consider the pros and cons of its usage, taking the overall system requirements into account.

#### Example 6.15 A Four-Input Dynamic NAND Gate

Figure 6-53 shows the design of a four-input NAND example designed using the dynamiccircuit style. Due to the dynamic nature of the gate, the derivation of the voltage-transfer



Figure 6-53 Schematic and transient response of a four-input dynamic NAND gate.

#### Scanned with CamScanner

# Chapter 6 • Designing Combinational Logic Gates in CMOS

characteristic diverges from the traditional approach. As discussed earlier, we assume that the switching threshold of the gate equals the threshold of the NMOS pull-down transistor. This results in asymmetrical noise margins, as shown in Table 6-10.

| Transistors | V <sub>OH</sub> | VOL   | V <sub>M</sub>  | NM <sub>H</sub> | NML             | t <sub>pHL</sub> | t <sub>pLH</sub> | t <sub>pre</sub> |
|-------------|-----------------|-------|-----------------|-----------------|-----------------|------------------|------------------|------------------|
| 6           | 2.5 V           | 0 V · | V <sub>TN</sub> | $2.5 - V_{TN}$  | V <sub>TN</sub> | 110 ps           | 0 ps             | 83 ps            |

Table 6-10 The dc and ac parameters of a four-input dynamic NAND.

The dynamic behavior of the gate is simulated with SPICE. It is assumed that all inputs are set high when the clock goes high. On the rising edge of the clock, the output node is discharged. The resulting transient response is plotted in Figure 6-53, and the propagation delays are summarized in Table 6-10. The duration of the precharge cycle can be adjusted by changing the size of the PMOS precharge transistor. Making the PMOS too large should be avoided, however, as it both slows down the gate and increases the capacitive load on the clock line. For large designs, the latter factor might become a major design concern because the clock load can become excessive and hard to drive.

As mentioned earlier, the static gate parameters are time dependent. To illustrate this, consider a four-input NAND gate with all the partial inputs tied together, and are making a low-to-high transition. Figure 6-54 shows a transient simulation of the output voltage for three different input transitions—from 0 to 0.45 V, 0.5 V and 0.55 V, respectively. In the preceding discussion, we have defined the switching threshold of the dynamic gate as the device threshold. However, notice that the amount by which the output voltage drops is a strong function of the input voltage and the *available evaluation time.* The noise voltage needed to corrupt the signal has to be larger if the evaluation time is short. In other words, the switching threshold is truly time dependent.



**Figure 6-54** Effect of an input glitch on the output. The switching threshold depends on the time for evaluation. A larger glitch is acceptable if the evaluation phase is shorter.

# Chapter 6 • Designing Combinational Logic Gates in CMOS



Figure 6-58 Static bleeders compensate for the charge leakage.

adding a *bleeder transistor*, as shown in Figure 6-58a. The only function of the bleeder—an NMOS style pull-up device—is to compensate for the charge lost due to the pull-down leakage paths. To avoid the ratio problems associated with this style of circuit and the associated static power consumption, the bleeder resistance is made high (in other words, the device is kept small). This allows the (strong) pull-down devices to lower the *Out* node substantially below the switching threshold of the next gate. Often, the bleeder is implemented in a feedback configuration to eliminate the static power dissipation altogether (Figure 6-58b).

#### **Charge Sharing**

Another important concern in dynamic logic is the impact of charge sharing. Consider the circuit in Figure 6-59. During the precharge phase, the output node is precharged to  $V_{DD}$ . Assume that all inputs are set to 0 during precharge, and that the capacitance  $C_a$  is discharged. Assume further that input B remains at 0 during evaluation, while input A makes a  $0 \rightarrow 1$  transition, turning transistor  $M_a$  on. The charge stored originally on capacitor  $C_L$  is redistributed over  $C_L$  and  $C_a$ . This causes a drop in the output voltage, which cannot be recovered due to the dynamic nature of the circuit.

The influence on the output voltage is readily calculated. Under the assumptions given previously, the following initial conditions are valid:  $V_{out}(t=0) = V_{DD}$  and  $V_X(t=0) = 0$ . As a result, two possible scenarios must be considered:

1.  $\Delta V_{out} < V_{Tn}$ . In this case, the final value of  $V_X$  equals  $V_{DD} - V_{Tn}(V_X)$ . Charge conservation then yields

$$C_L V_{DD} = C_L V_{out}(\text{final}) + C_a [V_{DD} - V_{Tn}(V_X)]$$
  
or

$$\Delta V_{out} = V_{out}(\text{final}) + (-V_{DD}) = -\frac{C_a}{C_L} [V_{DD} - V_{Tn}(V_X)]$$

Scanned with CamScanner

(6.43)

ALL STREET

6.3 Dynamic CMOS Design



Figure 6-59 Charge sharing in dynamic networks.

计算机中心性保护的 经利益 化和超过物 机制度

2.  $\Delta V_{out} > V_{Tn}$ .  $V_{out}$  and  $V_X$  then reach the same value:

$$\Delta V_{out} = -V_{DD} \left( \frac{C_a}{C_a + C_L} \right) \tag{6.44}$$

We determine which of these scenarios is valid by the capacitance ratio. The boundary condition between the two cases can be determined by setting  $\Delta V_{out}$  equal to  $V_{Tn}$  in Eq. (6.44), yielding

$$\frac{C_a}{C_L} = \frac{V_{Tn}}{V_{DD} - V_{Tn}}$$
(6.45)

Case 1 holds when the  $(C_a/C_L)$  ratio is smaller than the condition defined in Eq. (6.45). If not, Eq. (6.44) is valid. Overall, it is desirable to keep the value of  $\Delta V_{out}$  below  $|V_{Tp}|$ . The output of the dynamic gate might be connected to a static inverter, in which case the low level of  $V_{out}$  would cause static power consumption. One major concern is a circuit malfunction if the output voltage is brought below the switching threshold of the gate it drives.

#### Example 6.18 Charge Sharing

Let us consider the impact of charge sharing on the dynamic logic gate shown in Figure 6-60, which implements a three-input EXOR function  $y = A \oplus B \oplus C$ . The first question to be resolved is what conditions cause the worst case voltage drop on node y. For simplicity, ignore the load inverter, and assume that all inputs are low during the precharge operation and that all isolated internal nodes  $(V_a, V_b, V_c, \text{ and } V_d)$  are initially at 0 V.

Inspection of the truth table for this particular logic function shows that the output stays high for 4 out of 8 cases. The worst case change in output is obtained by exposing the maximum amount of internal capacitance to the output node during the evaluation

293

#### 6.3 Dynamic CMOS Design

to  $V_{DD}$  during precharge, charge sharing does not occur. This solution obviously comes at the cost of increased area and capacitance.

#### **Capacitive Coupling**

The relatively high impedance of the output node makes the circuit very sensitive to crosstalk effects. A wire routed over or next to a dynamic node may couple capacitively and destroy the state of the floating node. Another equally important form of capacitive coupling is *backgate* (or *output-to-input*) coupling. Consider the circuit shown in Figure 6-62a, in which a dynamic two-input NAND gate drives a static NAND gate. A transition in the input *In* of the static gate may cause the output of the gate ( $Out_2$ ) to go low. This output transition couples capacitively to the other input of the gate (the dynamic node  $Out_1$ ) through the gate–source and gate–drain capacitances of transistor  $M_4$ . A simulation of this effect is shown in Figure 6-62b. It demonstrates how the coupling causes the output of the dynamic gate  $Out_1$  to drop significantly. This further causes the output of the static NAND gate not to drop all the way down to 0 V and a small amount of static power to be dissipated. If the voltage drop is large enough, the circuit can evaluate incorrectly, and the NAND output may not go low. When designing and laying out dynamic circuits, special care is needed to minimize capacitive coupling.

#### **Clock Feedthrough**

A special case of capacitive coupling is clock feedthrough, an effect caused by the capacitive coupling between the clock input of the precharge device and the dynamic output node. The coupling capacitance consists of the gate-to-drain capacitance of the precharge device, and includes both the overlap and channel capacitances. This capacitive coupling causes the output of the dynamic node to rise above  $V_{DD}$  on the low-to-high transition of the clock, assuming that the pull-down network is turned off. Subsequently, the fast rising and falling edges of the clock couple onto the signal node, as is quite apparent in the simulation of Figure 6-62b.

The danger of clock feedthrough is that it may cause the normally reverse-biased junction diodes of the precharge transistor to become forward biased. This causes electron injection into the substrate, which can be collected by a nearby high-impedance node in the 1 state, eventually resulting in faulty operation. CMOS latchup might be another result of this injection. For all purposes, high-speed dynamic circuits should be carefully simulated to ensure that clock feedthrough effects stay within bounds.

All of the preceding considerations demonstrate that the design of dynamic circuits is rather tricky and requires extreme care. It should therefore be attempted only when high performance is required, or high quality design-automation tools are available.

#### ✓6.3.4 Cascading Dynamic Gates

Besides the signal integrity issues, there is one major catch that complicates the design of dynamic circuits: Straightforward cascading of dynamic gates to create multilevel logic structures does not work. The problem is best illustrated with two cascaded n-type dynamic



296



inverters, shown in Figure 6-63a. During the precharge phase (i.e., CLK = 0), the outputs of both inverters are precharged to  $V_{DD}$ . Assume that the primary input *In* makes a  $0 \rightarrow 1$  transition (Figure 6-63b). On the rising edge of the clock, output  $Out_1$  starts to discharge. The second output should remain in the precharged state of  $V_{DD}$  as its expected value is 1 ( $Out_1$ transitions to 0 during evaluation). However, there is a finite propagation delay for the input to discharge  $Out_1$  to GND. Therefore, the second output also starts to discharge. As long as  $Out_1$ exceeds the switching threshold of the second gate, which approximately equals  $V_{Tn}$ , a conducting path exists between  $Out_2$  and GND, and precious charge is lost at  $Out_2$ . The conducting path is only disabled once  $Out_1$  reaches  $V_{Tn}$ , and turns off the NMOS pull-down transistor. This leaves  $Out_2$  at an intermediate voltage level. The correct level will not be recovered, because dynamic gates rely on capacitive storage, in contrast to static gates, which have dc restoration. The charge loss leads to reduced noise margins and potential malfunctioning.

# Chapter 6 • Designing Combinational Logic Gates in CMOS

#### 6.3 Dynamic CMOS Design



Figure 6-63 Cascade of dynamic n-type blocks.

The cascading problem arises because the outputs of each gate—and thus the inputs to the next stages—are precharged to 1. This may cause inadvertent discharge in the beginning of the evaluation cycle. Setting all the inputs to 0 during precharge addresses that concern. When doing so, all transistors in the pull-down network are turned off after precharge, and no inadvertent discharging of the storage capacitors can occur during evaluation. In other words, correct operation is guaranteed as long as **the inputs can only make a single 0**  $\rightarrow$  1 transition during the evaluation period.<sup>5</sup> Transistors are turned on only when needed—and at most, once per cycle. A number of design styles complying with this rule have been conceived, but the two most important ones are discussed next.

#### **Domino Logic**

**Concept** A domino logic module [Krambeck82] consists of an *n*-type dynamic logic block followed by a static inverter (Figure 6-64). During precharge, the output of the *n*-type dynamic gate is charged up to  $V_{DD}$ , and the output of the inverter is set to 0. During evaluation, the dynamic gate conditionally discharges, and the output of the inverter makes a conditional transition from  $0 \rightarrow 1$ . If one assumes that all the inputs of a domino gate are outputs of other domino gates,<sup>6</sup> then it is ensured that all inputs are set to 0 at the end of the precharge phase, and that the only transitions during evaluation are  $0 \rightarrow 1$  transitions. Hence, the formulated rule is obeyed. The introduction of the static inverter has the additional advantage that the fan-out of the gate is driven by a static inverter with a low-impedance output, which increases noise immunity. Also, the buffer reduces the capacitance of the dynamic output node by separating internal and load capacitances. Finally, the inverter can be used to drive a bleeder device to combat leakage and charge redistribution, as shown in the second stage of Figure 6-64.

<sup>5</sup>This ignores the impact of charge distribution and leakage effects, discussed earlier. <sup>6</sup>It is required that all other inputs that do not fall under this classification (for instance, primary inputs) stay constant during evaluation.

#### Chapter 6 • Designing Combinational Logic Gates in CMOS



Figure 6-64 Domino CMOS logic.

Consider now the operation of a chain of domino gates. During precharge, all inputs are set to 0. During evaluation, the output of the first domino block either stays at 0 or makes a  $0 \rightarrow 1$  transition, affecting the second gate. This effect might ripple through the whole chain, one after the other, similar to a line of falling dominoes—hence the name. Domino CMOS has the following properties:

- Since each dynamic gate has a static inverter, only noninverting logic can be implemented. Although there are ways to deal with this, as discussed in a subsequent section, this is a major limiting factor, and pure domino design has thus become rare.
- Very high speeds can be achieved: only a rising edge delay exists, while  $t_{pHL}$  equals zero. The inverter can be sized to match the *fan-our*, which is already much smaller than in the complimentary static CMOS case, as only a single gate capacitance has to be accounted for per fan-out gate.

Since the inputs to a domino gate are low during precharge, it is tempting to eliminate the evaluation transistor because this reduces clock load and increases pull-down drive. However, eliminating the evaluation device extends the precharge cycle—the precharge now has to ripple through the logic network as well. Consider the logic network shown in Figure 6-65, where the evaluation devices have been eliminated. If the primary input  $In_1$  is 1 during evaluation, the output of each dynamic gate evaluates to 0, and the output of each static inverter is 1. On the falling edge of the clock, the precharge operation is started. Assume further that  $In_1$  makes a high-to-low transition. The input to the second gate is initially high, and it takes two gate delays before network is fighting the precharge device. Similarly, the third gate has to wait until the second logic circuit is equal to its critical path. Another important negative is the extra power dissipation evaluation devices.

Chapter 7 • Designing Sequential Logic Circuits

- 7.6 Nonbistable Sequential Circuits
  - 7.6.1 The Schmitt Trigger
  - 7.6.2 Monostable Sequential Circuits
  - 7.6.3 Astable Circuits
- 7.7 Perspective: Choosing a Clocking Strategy
- 7.8 Summary

326

7.9 To Probe Further

#### 7.1 Introduction

As described earlier, combinational logic circuits have the property that the output of a logic block is only a function of the *current* input values, assuming that enough time has elapsed for the logic gates to settle. Still, virtually all useful systems require storage of state information, leading to another class of circuits called *sequential logic* circuits. In these circuits, the output depends not only on the *current* values of the inputs, but also on *preceding* input values. In other words, a sequential circuit remembers some of the past history of the system—it has memory.

Figure 7-1 shows a block diagram of a generic *finite-state machine* (FSM) that consists of combinational logic and registers, which hold the system state. The system depicted here belongs to the class of *synchronous* sequential systems, in which all registers are under control of a single global clock. The outputs of the FSM are a function of the current *Inputs* and the *Current State*. The *Next State* is determined based on the *Current State* and the current *Inputs* and is fed to the inputs of registers. On the rising edge of the clock, the *Next State* bits are copied to the outputs of the registers (after some propagation delay), and a new cycle begins. The register then ignores changes in the input signals until the next rising edge. In general, registers can be *positive edge triggered* (where the input data is copied on the rising edge of the clock) or *negative edge triggered* (where the input data is copied on the falling edge, as indicated by a small circle at the clock input)

This chapter discusses the CMOS implementation of the most important sequential building blocks. A variety of choices in sequential primitives and clocking methodologies exist; making the correct selection is getting increasingly important in modern digital circuits, and can





#### 7.1 Introduction

have a great impact of performance, power, and/or design complexity. Before embarking on a detailed discussion of the various design options, a review of the relevant design metrics and a classification of the sequential elements is necessary.

# 7.1.1 / Timing Metrics for Sequential Circuits

There are three important timing parameters associated with a register. They are shown in Figure 7-2. The setup time  $(t_{su})$  is the time that the data inputs (D) must be valid before the clock transition (i.e., the  $0 \rightarrow 1$  transition for a positive edge-triggered register). The hold time  $(t_{hold})$  is the time the data input must remain valid after the clock edge. Assuming that the setup and hold times are met, the data at the D input is copied to the Q output after a worst case propagation delay (with reference to the clock edge) denoted by  $t_{c-a}$ .

Once we know the timing information for the registers and the combinational logic blocks, we can derive the system-level timing constraints (see Figure 7-1 for a simple system view). In synchronous sequential circuits, switching events take place concurrently in response to a clock stimulus. Results of operations await the next clock transitions before progressing to the next stage. In other words, the next cycle cannot begin unless all current computations have completed and the system has come to rest. The *clock period T*, at which the sequential circuit operates, must thus accommodate the longest delay of any stage in the network. Assume that the worst case propagation delay of the logic equals  $t_{plogic}$ , while its minimum delay—also called the *contamination delay*—is  $t_{cd}$ . The minimum clock period T required for proper operation of the sequential circuit is given by

$$T \ge t_{c-q} + t_{plogic} + t_{su} \tag{7.1}$$

The hold time of the register imposes an extra constraint for proper operation, namely

$$t_{cdregister} + t_{cdlogic} \ge t_{hold} \tag{7.2}$$



Figure 7-2 Definition of setup time, hold time, and propagation delay of a synchronous register.

## Chapter 7 • Designing Sequential Logic Circuits

where  $t_{cdregister}$  is the minimum propagation delay (or contamination delay) of the register. This constraint ensures that the input data of the sequential elements is held long enough after the clock edge and is not modified too soon by the new wave of data coming in.

As seen from Eq. (7.1), it is important to minimize the values of the timing parameters associated with the register, as these directly affect the rate at which a sequential circuit can be clocked. In fact, modern high-performance systems are characterized by a very low logic depth, and the register propagation delay and setup times account for a significant portion of the clock period. For example, the DEC Alpha EV6 microprocessor [Gieseke97] has a maximum logic depth of 12 gates, and the register overhead stands for approximately 15% of the clock period. In general, the requirement of Eq. (7.2) is not difficult to meet, although it becomes an issue when there is little or no logic between registers.<sup>1</sup>

#### **Classification of Memory Elements** 7.1.2

#### **Foreground versus Background Memory**

At a high level, memory is classified into background and foreground memory. Memory that is embedded into logic is foreground memory and is most often organized as individual registers or register banks. Large amounts of centralized memory core are referred to as background memory. Background memory, discussed in Chapter 12, achieves higher area densities through efficient use of array structures and by trading off performance and robustness for size. In this chapter, we focus on foreground memories.

### Static versus Dynamic Memory

328

Memories can be either static or dynamic. Static memories preserve the state as long as the power is turned on. They are built by using positive feedback or regeneration, where the circuit topology consists of intentional connections between the output and the input of a combinational circuit. Static memories are most useful when the register will not be updated for extended periods of time. Configuration data, loaded at power-up time, is a good example of static data. This condition also holds for most processors that use conditional clocking (i.e., gated clocks) where the clock is turned off for unused modules. In that case, there are no guarantees on how frequently the registers will be clocked, and static memories are needed to preserve the state information. Memory based on positive feedback falls under the class of elements called multivibrator circuits. The bistable element is its most popular representative, but other elements such as monostable and astable circuits also are frequently used.

Dynamic memories store data for a short period of time, perhaps milliseconds. They are based on the principle of temporary charge storage on parasitic capacitors associated with MOS devices. As with dynamic logic, discussed earlier, the capacitors have to be refreshed periodically to compensate for charge leakage. Dynamic memories tend to be simpler, resulting in sig-

nificantly higher performance and lower power dissipation. They are most useful in datapath <sup>1</sup>Or when the clocks at different registers are somewhat out of phase due to clock skew. We discuss this topic in Chapter 10

#### 7.1 Introduction

circuits that require high performance levels and are periodically clocked. It is possible to use dynamic circuitry even when circuits are conditionally clocked, if the state can be discarded when a module goes into idle mode.

#### Latches versus Registers

A latch is an essential component in the construction of an *edge-triggered* register. It is a *level*sensitive circuit that passes the D input to the Q output when the clock signal is high. This latch is said to be in *transparent* mode. When the clock is low, the input data sampled on the falling edge of the clock is held stable at the output for the entire phase, and the latch is in *hold* mode. The inputs must be stable for a short period around the falling edge of the clock to meet setup and hold requirements. A latch operating under these conditions is a *positive latch*. Similarly, a *negative latch* passes the D input to the Q output when the clock signal is low. Positive and negative latches are also called *transparent high* or *transparent low*, respectively. The signal waveforms for a positive and negative latch are shown in Figure 7-3. A wide variety of static and dynamic implementations exists for the realization of latches.

Contrary to level-sensitive latches, edge-triggered registers only sample the input on a clock transition—that is,  $0 \rightarrow 1$  for a positive edge-triggered register, and  $1 \rightarrow 0$  for a negative edge-triggered register. They are typically built to use the latch primitives of Figure 7-3. An often-recurring configuration is the master-slave structure, that cascades a positive and negative latch. Registers also can be constructed by using one-shot generators of the clock signal ("glitch" registers), or by using other specialized structures. Examples of these are shown later in this chapter.

The literature on sequential circuits has been plagued by ambiguous definitions for the different types of storage elements (i.e., register, flip-flop, and latch). To avoid confusion, we adhere strictly to the following set of definitions in this book:



# Chapter 7 • Designing Sequential Logic Circuits

- An edge-triggered storage element is called a register;
- A latch is a level-sensitive device;
- and any bistable component, formed by the cross coupling of gates, is called a flip-flop.<sup>2</sup>

# 7.2/ Static Latches and Registers

#### The Bistability Principle 7.2.1

Static memories use positive feedback to create a bistable circuit-a circuit having two stable states that represent 0 and 1. The basic idea is shown in Figure 7-4a, which shows two inverters connected in cascade along with a voltage-transfer characteristic typical of such a circuit. Also plotted are the VTCs of the first inverter—that is,  $V_{o1}$  versus  $V_{i1}$ —and the second inverter ( $V_{o2}$ ) versus  $V_{o1}$ ). The latter plot is rotated to accentuate that  $V_{i2} = V_{o1}$ . Assume now that the output of the second inverter  $V_{o2}$  is connected to the input of the first  $V_{i1}$ , as shown by the dotted lines in Figure 7-4a. The resulting circuit has only three possible operation points (A, B, and C), as demonstrated on the combined VTC. It is easy to prove the validity of the following important conjecture:

When the gain of the inverter in the transient region is larger than 1, A and B are the only stable operation points, and C is a metastable operation point.

Suppose that the cross-coupled inverter pair is biased at point C. A small deviation from this bias point, possibly caused by noise, is amplified and regenerated around the circuit loop.



Two cascaded inverters (a) and their VTCs (b).

<sup>2</sup>An edge-triggered register is often referred to as a flip-flop as well. In this text, flip-flop is used to **uniquely** mean

7.2 Static Latches and Registers



Figure 7-5 Metastable versus stable operation points.

This is a result of the gain around the loop being larger than 1. The effect is demonstrated in Figure 7-5a. A small deviation  $\delta$  is applied to  $V_{i1}$  (biased in C). This deviation is amplified by the gain of the inverter. The enlarged divergence is applied to the second inverter and amplified once more. The bias point moves away from C until one of the operation points A or B is reached. In conclusion, C is an unstable operation point. Every deviation (even the smallest one) causes the operation point to run away from its original bias. The chance is indeed very small that the cross-coupled inverter pair is biased at C and stays there. Operation points with this property are termed *metastable*.

Call Barrier Hill Solt of

On the other hand, A and B are stable operation points, as demonstrated in Figure 7-5b. In these points, the loop gain is much smaller than unity. Even a rather large deviation from the operation point reduces in sizes and disappears.

Hence, the cross coupling of two inverters results in a *bistable* circuit—that is, a circuit with two stable states, each corresponding to a logic state. The circuit serves as a memory, storing either a 1 or a 0 (corresponding to positions A and B).

In order to change the stored value, we must be able to bring the circuit from state A to B and vice versa. Since the precondition for stability is that the loop gain G is smaller than unity, we can achieve this by making A (or B) temporarily unstable by increasing G to a value larger than 1. This is generally done by applying a trigger pulse at  $V_{i1}$  or  $V_{i2}$ . For example, assume that the system is in position A ( $V_{i1} = 0$ ,  $V_{i2} = 1$ ). Forcing  $V_{i1}$  to 1 causes both inverters to be on simultaneously for a short time and the loop gain G to be larger than 1. The positive feedback regenerates the effect of the trigger pulse, and the circuit moves to the other state (B, in this case). The width of the trigger pulse need be only a little larger than the total propagation delay around the circuit loop, which is twice the average propagation delay of the inverters.

In summary, a bistable circuit has two stable states. In absence of any triggering, the circuit remains in a single state (assuming that the power supply remains applied to the circuit) and thus remembers a value. Another common name for a bistable circuit is *flip-flop*. A flip-flop is useful only if there also exists a means to bring it from one state to the other one. In general, two different approaches may be used to accomplish the following:

• Cutting the feedback loop. Once the feedback loop is open, a new value can easily be written into *Out* (or *Q*). Such a latch is called *multiplexer based*, as it realizes that the logic expression for a synchronous latch is identical to the multiplexer equation:

$$Q = \overline{Clk} \cdot Q + Clk \cdot In \tag{7.3}$$

This approach is the most popular in today's latches, and thus forms the bulk of this section.

• Overpowering the feedback loop. By applying a trigger signal at the input of the flipflop, a new value is forced into the cell by overpowering the stored value. A careful sizing of the transistors in the feedback loop and the input circuitry is necessary to make this possible. A weak trigger network may not succeed in overruling a strong feedback loop. This approach used to be in vogue in the earlier days of digital design, but has gradually fallen out of favor. It is, however, the dominant approach to the implementation of static background memories (which we discuss more fully in Chapter 12). A short introduction will be given later in the chapter.

#### 7.2.2 ✓ Multiplexer-Based Latches

332

The most robust and common technique to build a latch involves the use of transmission-gate multiplexers. Figure 7-6 shows an implementation of positive and negative static latches based on multiplexers. For a negative latch, input 0 of the multiplexer is selected when the clock is low, and the D input is passed to the output. When the clock signal is high, input 1 of the multiplexer, which connects to the output of the latch, is selected. The feedback ensures a stable output as long as the clock is high. Similarly in the positive latch, the D input is selected when the clock signal is high, and the output is held (using feedback) when the clock signal is low.

A transistor-level implementation of a positive latch based on multiplexers is shown in Figure 7-7. When CLK is high, the bottom transmission gate is on and the latch is transparent—that is, the D input is copied to the Q output. During this phase, the feedback loop is open, since the top transmission gate is off. Sizing of the transistors therefore is not critical for realizing correct functionality. The number of transistors that the clock drives is an important metric from a power perspective, because the clock has an *activity factor* of 1. This particular latch implementation.



#### 7.2 Static Latches and Registers



Figure 7-7 Transistor-level implementation of a positive latch built by using transmission gates.



(a) Schematic diagram

(b) Non overlapping clocks

Figure 7-8 Multiplexer-based NMOS latch by using NMOS-only pass transistors for multiplexers.

tation is not very efficient from this perspective: It presents a load of four transistors to the CLK signal.

It is possible to reduce the clock load to two transistors by implementing multiplexers that use as NMOS-only pass transistors, as shown in Figure 7-8. When *CLK* is high, the latch samples the *D* input, while a low clock signal enables the feedback loop, and puts the latch in the hold mode. While attractive for its simplicity, the use of NMOS-only pass transistors results in the passing of a degraded high voltage of  $V_{DD} - V_{Tn}$  to the input of the first inverter. This impacts both noise margin and the switching performance, especially in the case of low values of  $V_{DD}$ and high values of  $V_{Tn}$ . It also causes static power dissipation in the first inverter, because the maximum input voltage to the inverter equals  $V_{DD} - V_{Tn}$ , and the PMOS device of the inverter is never fully turned off.

# ✓7.2.3 Master–Slave Edge-Triggered Register

The most common approach for constructing an *edge-triggered* register is to use a *master-slave* configuration, as shown in Figure 7-9. The register consists of cascading a negative latch (master stage) with a positive one (slave stage). A multiplexer-based latch is used in this particular

# Chapter 7 • Designing Sequential Logic Circuits



Figure 7-9 Positive edge-triggered register based on a master-slave configuration.

implementation, although any latch could be used. On the low phase of the clock, the master stage is transparent, and the D input is passed to the master stage output,  $Q_M$ . During this period, the slave stage is in the hold mode, keeping its previous value by using feedback. On the rising edge of the clock, the master stage stops sampling the input, and the slave stage starts sampling. During the high phase of the clock, the slave stage samples the output of the master stage  $(Q_M)$ , while the master stage remains in a hold mode. Since  $Q_M$  is constant during the high phase of the clock, the output Q makes only one transition per cycle. The value of Q is the value of D right before the rising edge of the clock, achieving the positive edge-triggered effect. A negative edgetriggered register can be constructed by using the same principle by simply switching the order of the positive and negative latches (i.e., placing the positive latch first).

A complete transistor-level implementation of the master-slave positive edge-triggered register is shown in Figure 7-10. The multiplexer is implemented by using transmission gates as discussed in the previous section. When the clock is low ( $\overline{CLK} = 1$ ),  $T_1$  is on and  $T_2$  is off, and the *D* input is sampled onto node  $Q_M$ . During this period,  $T_3$  and  $T_4$  are off and on, respectively. The cross-coupled inverters ( $I_5$ ,  $I_6$ ) hold the state of the slave latch. When the clock goes high, the master stage stops sampling the input and goes into a hold mode.  $T_1$  is off and  $T_2$  is on, and the cross-coupled inverters  $I_2$  and  $I_3$  hold the state of  $Q_M$ . Also,  $T_3$  is on and  $T_4$  is off, and  $Q_M$  is copied to the output Q.



Figure 7-10 Master-slave positive edge-triggered register, using multiplexers.

#### 7.2 Static Latches and Registers

#### Problem 7.1 Optimization of the Master-Slave Register

It is possible to remove the inverters  $I_1$  and  $I_4$  from Figure 7-10 without loss of functionality. Is there any advantage to including these inverters in the implementation?

#### **Timing Properties of Multiplexer-Based Master-Slave Registers**

The setup time is the time before the rising edge of the clock that the input data D must be valid. This is similar to asking the question, how long before the rising edge of the clock must the D input be stable such that  $Q_M$  samples the value reliably? For the transmission gate multiplexer-based register, the input D has to propagate through  $I_1$ ,  $T_1$ ,  $I_3$ , and  $I_2$  before the rising edge of the clock. This ensures that the node voltages on both terminals of the transmission gate  $T_2$  are at the same value. Otherwise, it is possible for the cross-coupled pair  $I_2$  and  $I_3$  to settle to an incorrect value. The setup time is therefore equal to  $3 \times t_{pd \ inv} + t_{pd \ tx}$ .

The propagation delay is the time it takes for the value of  $Q_M$  to propagate to the output Q. Note that, since we included the delay of  $I_2$  in the setup time, the output of  $I_4$  is valid before the rising edge of the clock. Therefore, the delay  $t_{c-q}$  is simply the delay through  $T_3$  and  $I_6$  ( $t_{c-q} = t_{pd, tx} + t_{pd, inv}$ ).

The hold time represents the time that the input must be held stable after the rising edge of the clock. In this case, the transmission gate  $T_1$  turns off when the clock goes high. Since both the D input and the CLK pass through inverters before reaching  $T_1$ , any changes in the input after the clock goes high do not affect the output. Therefore, the hold time is 0.

#### Example 7.1 Timing Analysis, Using SPICE

To obtain the setup time of the register while using SPICE, we progressively skew the input with respect to the clock edge until the circuit fails. Figure 7-11 shows the setup-time simulation assuming a skew of 210 ps and 200 ps. For the 210 ps case, the correct value of input D is sampled (in this case, the Q output remains at the value of  $V_{DD}$ ). For a skew of 200 ps, an incorrect value propagates to the output, as the Q output transitions to 0. Node  $Q_M$  starts to go high, and the output of  $I_2$  (the input to transmission gate  $T_2$ ) starts to fall. However, the clock is enabled before the two nodes across the transmission gate  $T_2$  settle to the same value. This results in an incorrect value being written into the master latch. The setup time for this register is 210 ps.

In a similar fashion, the hold time can be simulated. The *D*-input edge is once again skewed relative to the clock signal until the circuit stops functioning. For this design, the 7.2 Static Latches and Registers



Figure 7-13 Reduced load clock load static master-slave register.



Figure 7-14 Reverse conduction possible in the transmission gate.

ratioed. Figure 7-13 shows that the feedback transmission gate can be eliminated by directly cross-coupling the inverters.

The penalty paid for the reducted in clock load is an increased design complexity. The transmission gate  $(T_1)$  and its source driver must overpower the feedback inverter  $(I_2)$  to switch the state of the cross-coupled inverter. The sizing requirements for the transmission gates can be derived by using an analysis similar to the one used for the sizing of the level-restoring device in Chapter 6. The input to the inverter  $I_1$  must be brought below its switching threshold in order to make a transition. If minimum-sized devices are to be used in the transmission gates, it is essential that the transistors of inverter  $I_2$  should be made even weaker. This can be accomplished by making their channel lengths larger than minimum. Using minimum or close-to-minimum size devices in the transmission gates is desirable to reduce the power dissipation in the latches and the clock distribution network.

Another problem with this scheme is *reverse conduction*—the second stage can affect the state of the first latch. When the slave stage is on (Figure 7-14), it is possible for the combination of  $T_2$  and  $I_4$  to influence the data stored in the  $I_1$ - $I_2$  latch. As long as  $I_4$  is a weak device, this fortunately not a major problem.

#### **Non-Ideal Clock Signals**

So far, we have assumed that  $\overline{CLK}$  is a perfect inversion of CLK, or in other words, that the delay of the generating inverter is zero. Even if this were possible, this still would not be a good assumption. Variations can exist in the wires used to route the two clock signals, or the load capacitances can vary based on data stored in the connecting latches. This effect, known as *clock skew*, is a major problem, causing the two clock signals to overlap, as shown in Figure 7-15b. *Clock overlap* can cause two types of failures, which we illustrate for the NMOS-only negative master-slave register of Figure 7-15a.

### Chapter 7 • Designing Sequential Logic Circuits





- 1. When the clock goes high, the slave stage should stop sampling the master stage output and go into a hold mode. However, since CLK and  $\overline{CLK}$  are both high for a short period of time (the overlap period), both sampling pass transistors conduct, and there is a direct path from the D input to the Q output. As a result, data at the output can change on the rising edge of the clock, which is undesired for a negative edge-triggered register. This is known as a race condition in which the value of the output Q is a function of whether the input D arrives at node X before or after the falling edge of  $\overline{CLK}$ . If node X is sampled in the metastable state, the output will switch to a value determined by noise in the system.
- 2. The primary advantage of the multiplexer-based register is that the feedback loop is open during the sampling period, and therefore the sizing of the devices is not critical to functionality. However, if there is clock overlap between CLK and CLK, node A can be driven by both D and B, resulting in an undefined state.

These problems can be avoided by using two nonoverlapping clocks instead,  $PHI_1$  and  $PHI_2$  (Figure 7-16), and by keeping the nonoverlap time  $t_{non_overlap}$  between the clocks large enough so that no overlap occurs even in the presence of clock-routing delays. During the nonoverlap time, the FF is in the high-impedance state—the feedback loop is open, the loop gain is zero, and the input is disconnected. Leakage will destroy the state if this condition holds for too storage approaches, depending upon the state of the clock.

7.2 Static Latches and Registers



### Problem 7.2 Generating Nonoverlapping Clocks

Figure 7-17 shows one possible implementation of the clock generation circuitry for generating a twophase nonoverlapping clock. Assuming that each gate has a unit gate delay, derive the timing relationship between the input clock and the two output clocks. What is the nonoverlap period? How can this period be increased if needed?



Figure 7-17 Circuitry for generating a two-phase nonoverlapping clock.

### ✓ 7.2.4 Low-Voltage Static Latches

The scaling of supply voltages is critical for low-power operation. Unfortunately, certain latch structures do not function at reduced supply voltages. For example, without the scaling of device thresholds, NMOS-only pass transistors (e.g., Figure 7-16) don't scale well with supply voltage

### Chapter 7 • Designing Sequential Logic Circuits

due to its inherent threshold drop. At very low power supply voltages, the input to the inverter cannot be raised above the switching threshold, resulting in incorrect evaluation. Even with the use of transmission gates, performance degrades significantly at reduced supply voltages.

Scaling to low supply voltages thus requires the use of reduced threshold devices. However, this has the negative effect of exponentially increasing the subthreshold leakage power (as discussed in Chapter 6). When the registers are constantly accessed, the leakage energy typically is insignificant compared with the switching power. However, with the use of conditional clocks, it is possible that registers are idle for extended periods, and the leakage energy expended by registers can be quite significant.

Many solutions are being explored to address the problem of high leakage during idle periods. One approach involves the use of Multiple Threshold devices, as shown in Figure 7-18 [Mutoh95]. Only the negative latch is shown. The shaded inverters and transmission gates are implemented in low-threshold devices. The low-threshold inverters are gated by using highthreshold devices to eliminate leakage.

During the normal mode of operation, the sleep devices are turned on. When the clock is low, the *D* input is sampled and propagates to the output. The latch is in the hold mode when the clock is high. The feedback transmission gate conducts and the cross-coupled feedback is enabled. An extra inverter, in parallel with the low-threshold one, is added to store the state when the latch is in *idle* (or *sleep*) mode. Then, the high-threshold devices in series with the low-threshold inverter are turned off (the *SLEEP* signal is high), eliminating leakage. It is assumed that clock



Figure 7-18 Solving the leakage problem, using multiple-threshold CMOS.

#### Chapter 7 • Designing Sequential Logic Circuits

## √ 7.3 Dynamic Latches and Registers

344

Storage in a static sequential circuit relies on the concept that a cross-coupled inverter pair produces a bistable element and can thus be used to memorize binary values. This approach has the useful property that a stored value remains valid as long as the supply voltage is applied to the circuit—hence the name *static*. The major disadvantage of the static gate, however, is its complexity. When registers are used in computational structures that are constantly clocked (such as a pipelined datapath), the requirement that the memory should hold state for extended periods of time can be significantly relaxed.

This results in a class of circuits based on temporary storage of charge on parasitic capacitors. The principle is exactly identical to the one used in dynamic logic—charge stored on a capacitor can be used to represent a logic signal. The absence of charge denotes a 0, while its presence stands for a stored 1. No capacitor is ideal, unfortunately, and some charge leakage is always present. A stored value can thus only be kept for a limited amount of time, typically in the range of milliseconds. If one wants to preserve signal integrity, a periodic *refresh* of the value is necessary; hence, the name *dynamic* storage. Reading the value of the stored signal from a capacitor without disrupting the charge requires the availability of a device with a high-input impedance.

### 7.3.1 Dynamic Transmission-Gate Edge-Triggered Registers

A fully dynamic positive edge-triggered register based on the master-slave concept is shown in Figure 7-23. When CLK = 0, the input data is sampled on storage node 1, which has an equivalent capacitance of  $C_1$ , consisting of the gate capacitance of  $I_1$ , the junction capacitance of  $T_1$ , and the overlap gate capacitance of  $T_1$ . During this period, the slave stage is in a hold mode, with node 2 in a high-impedance (floating) state. On the rising edge of clock, the transmission gate  $T_2$ turns on, and the value sampled on node 1 right before the rising edge propagates to the output Q(note that node 1 is stable during the high phase of the clock, since the first transmission gate is





such as channel length modulation and DIBL. Figure 7-22b plots the transient response for different device sizes and confirms that an individual W/L ratio of greater than 3 is required to overpower the feedback and switch the state of the latch:

# Chapter 7 • Designing Sequential Logic Circuits

WARNING: The dynamic circuits shown in this section are very appealing from the perspective of complexity, performance, and power. Unfortunately, robustness considerations limit their use. In a fully dynamic circuit like that shown in Figure 7-23, a signal net that is capacitively coupled to the internal storage node can inject significant noise and destroy the state. This is especially important in ASIC flows, where there is little control over coupling between signal nets and internal dynamic nodes. Leakage currents cause another problem: Most modern processors require that the clock can be slowed down or completely halted, to conserve power in lowactivity periods. Finally, the internal dynamic nodes do not track variations in power supply voltage. For example, when CLK is high for the circuit in Figure 7-23, node A holds its state, but it does not track variations in the power supply seen by  $I_1$ . This results in reduced noise margins.

Most of these problems can be adequately addressed by adding a weak feedback inverter and making the circuit pseudostatic (Figure 7-25). While this comes at a slight cost in delay, it improves the noise immunity significantly. Unless registers are used in a highly-controlled environment (for instance, a custom-designed high-performance datapath), they should be made pseudostatic or static. This holds for all latches and registers discussed in this section.



Figure 7-25 Making a dynamic latch pseudostatic.

#### 7.3.2 C<sup>2</sup>MOS—A Clock-Skew Insensitive Approach

# The C<sup>2</sup>MOS Register

346

Figure 7-26 shows an ingenious positive edge-triggered register that is based on a master-slave concept insensitive to clock overlap. This circuit is called the C<sup>2</sup>MOS (Clocked CMOS) register [Suzuki73], and operates in two phases:

- 1. CLK = 0 ( $\overline{CLK} = 1$ ): The first tristate driver is turned on, and the master stage acts as an inverter sampling the inverted version of D on the internal node X. The master stage is in the evaluation mode. Meanwhile, the slave section is in a high-impedance mode, or in a hold mode. Both transistors  $M_7$  and  $M_8$  are off, decoupling the output from the input. The output Q retains its previous value stored on the output capacitor  $C_{L2}$ .
- 2. The roles are reversed when CLK = 1: The master stage section is in hold mode  $(M_3 M_4)$ off), while the second section evaluates  $(M_7 - M_8 \text{ on})$ . The value stored on  $C_{LI}$  propagates to the output node through the slave stage, which acts as an inverter.

The overall circuit operates as a positive edge-triggered master-slave register very similar to the transmission-gate-based register presented earlier. However, there is an important difference:

in in the all the second 是此的有限的影响和 2011年1月1日日日日日日日日

The should be stated

# Chapter 7 • Designing Sequential Logic Circuits



Figure 7-27 between In and D, as illustrated by the arrows.

exists a time slot where both the NMOS and PMOS transistors are conducting. This creates a path between input and output that can destroy the state of the circuit. Simulations have shown that the circuit operates correctly as long as the clock rise time (or fall time) is smaller than approximately five times the propagation delay of the register. This criterion is not too stringent, and it is easily met in practical designs. The impact of the rise and fall times is illustrated in Figure 7-28, which plots the simulated transient response of a C<sup>2</sup>MOS D FF for clock slopes of, respectively, 0.1 and 3 ns. For slow clocks, the potential for a race condition exists.



348

# 7.3 Dynamic Latches and Registers

# Dual-Edge Registers

So far, we have focused on edge-triggered registers that sample the input data on only one of the clock edges (rising or falling). It also is possible to design sequential circuits that sample the input on both edges. The advantage of this scheme is that a lower frequency clock—half the original rate—is distributed for the same functional throughput, resulting in power savings in the clock distribution network. Figure 7-29 shows a modification of the C<sup>2</sup>MOS register enabling sampling on both edges. It consists of two parallel master–slave edge-triggered registers, whose outputs are multiplexed by using tristate drivers.

When clock is high, the positive latch composed of transistors  $M_1-M_4$  is sampling the inverted D input on node X. Node Y is held stable, since devices  $M_9$  and  $M_{10}$  are turned off. On the falling edge of the clock, the top slave latch  $M_5-M_8$  turns on, and drives the inverted value of X to the Q output. During the low phase, the bottom master latch  $(M_1, M_4, M_9, M_{10})$  is turned on, sampling the inverted D input on node Y. Note that the devices  $M_1$  and  $M_4$  are reused, reducing the load on the D input. On the rising edge, the bottom slave latch conducts and drives the inverted version of Y on node Q. Data thus changes on both edges. Note that the slave latches operate in a complementary fashion—that is, only one of them is turned on during each phase of the clock.



Figure 7-29 C<sup>2</sup>MOS-based dual-edge triggered register.

349



Figure 7-39 The need for the shorting transistor M<sub>4</sub>.

# / 7.5 Pipelining: An Approach to Optimize Sequential Circuits

Pipelining is a popular design technique often used to accelerate the operation of datapaths in digital processors. The concept is explained with the example of Figure 7-40a. The goal of the presented circuit is to compute  $\log(|a + b|)$ , where both a and b represent streams of numbers (i.e., the computation must be performed on a large set of input values). The minimal clock period  $T_{min}$  necessary to ensure correct evaluation is given as

$$T_{\min} = t_{c-q} + t_{pd,logic} + t_{su}$$
(7.7)

where  $t_{c-q}$  and  $t_{su}$  are the propagation delay and the setup time of the register, respectively. We assume that the registers are edge-triggered D registers. The term  $t_{pd,logic}$  stands for the worst case delay path through the combinational network, which consists of the adder, absolute value, and logarithm functions. In conventional systems (that don't push the edge of technology), the

# 7.5 Pipelining: An Approach to Optimize Sequential Circuits





latter delay is generally much larger than the delays associated with the registers and dominates the circuit performance. Assume that each logic module has an equal propagation delay. We note that each logic module is then active for only one-third of the clock period (if the delay of the register is ignored). For example, the adder unit is active during the first third of the period and remains idle (no useful computation) during the other two-thirds of the period. Pipelining is a technique to improve the resource utilization, and increase the functional through-put. Assume that we introduce registers between the logic blocks, as shown in Figure 7-40b. This causes the computation for one set of input data to spread over a number of clock-periods, as shown in Table 7-1. The result for the data set  $(a_1, b_1)$  only appears at the output after three clock periods.

| 12.00 | Clock Period Add |             | Absolute Value | Logarithm           |
|-------|------------------|-------------|----------------|---------------------|
| 1     | 1                | $a_1 + b_1$ |                | i kata wa Ki        |
|       | 2                | $a_2 + b_2$ | $ a_1+b_1 $    |                     |
|       | 3                | $a_3 + b_3$ | $ a_2 + b_2 $  | $\log( a_1+b_1 )$   |
|       | 4                | $a_4 + b_4$ | $ a_3 + b_3 $  | $\log( a_2 + b_2 )$ |
| 6     | 5                | $a_5 + b_5$ | $ a_4 + b_4 $  | $\log( a_3+b_3 )$   |

| CO-11     | -          | ninglingd | computations. |
|-----------|------------|-----------|---------------|
| Table 7-1 | Example of | pipelined | Compatienter  |

# Chapter 2 FPGA Architectures: An Overview

Field Programmable Gate Arrays (FPGAs) were first introduced almost two and a half decades ago. Since then they have seen a rapid growth and have become a popular implementation media for digital circuits. The advancement in process technology has greatly enhanced the logic capacity of FPGAs and has in turn made them a viable implementation alternative for larger and complex designs. Further, programmable nature of their logic and routing resources has a dramatic effect on the quality of final device's area, speed, and power consumption.

This chapter covers different aspects related to FPGAs. First of all an overview of the basic FPGA architecture is presented. An FPGA comprises of an array of programmable logic blocks that are connected to each other through programmable interconnect network. Programmability in FPGAs is achieved through an underlying programming technology. This chapter first briefly discusses different programming technologies. Details of basic FPGA logic blocks and different routing architectures are then described. After that, an overview of the different steps involved in FPGA design flow is given. Design flow of FPGA starts with the hardware description of the circuit which is later synthesized, technology mapped and packed using different tools. After that, the circuit is placed and routed on the architecture to complete the design flow.

The programmable logic and routing interconnect of FPGAs makes them flexible and general purpose but at the same time it makes them larger, slower and more power consuming than standard cell ASICs. However, the advancement in process technology has enabled and necessitated a number of developments in the basic FPGA architecture. These developments are aimed at further improvement in the overall efficiency of FPGAs so that the gap between FPGAs and ASICs might be reduced. These developments and some future trends are presented in the last section of this chapter.

## 2.1 Introduction to FPGAs

Field programmable Gate Arrays (FPGAs) are pre-fabricated silicon devices that can be electrically programmed in the field to become almost any kind of digital circuit or system. For low to medium volume productions, FPGAs provide cheaper solution and faster time to market as compared to Application Specific Integrated Circuits (ASIC) which normally require a lot of resources in terms of time and money to obtain first device. FPGAs on the other hand take less than a minute to configure and they cost anywhere around a few hundred dollars to a few thousand dollars. Also for varying requirements, a portion of FPGA can be partially reconfigured while the rest of an FPGA is still running. Any future updates in the final product can be easily upgraded by simply downloading a new application bitstream. However, the main advantage of FPGAs i.e. flexibility is also the major cause of its draw back. Flexible nature of FPGAs makes them significantly larger, slower, and more power consuming than their ASIC counterparts. These disadvantages arise largely because of the programmable routing interconnect of FPGAs which comprises of almost 90% of total area of FPGAs. But despite these disadvantages, FPGAs present a compelling alternative for digital system implementation due to their less time to market and low volume cost.

Normally FPGAs comprise of:

- Programmable logic blocks which implement logic functions.
- Programmable routing that connects these logic functions.
- I/O blocks that are connected to logic blocks through routing interconnect and that make off-chip connections.

A generalized example of an FPGA is shown in Fig. 2.1 where configurable logic blocks (CLBs) are arranged in a two dimensional grid and are interconnected by programmable routing resources. I/O blocks are arranged at the periphery of the grid and they are also connected to the programmable routing interconnect. The "programmable/reconfigurable" term in FPGAs indicates their ability to implement a new function on the chip after its fabrication is complete. The reconfigurabil-ity/programmability of an FPGA is based on an underlying programming technology, which can cause a change in behavior of a pre-fabricated chip after its fabrication.

## 2.2 Programming Technologies

There are a number of programming technologies that have been used for reconfigurable architectures. Each of these technologies have different characteristics which in turn have significant effect on the programmable architecture. Some of the well known technologies include static memory [122], flash [54], and anti-fuse [61].



Fig. 2.1 Overview of FPGA architecture [22]

# 2.2.1 SRAM-Based Programming Technology

Static memory cells are the basic cells used for SRAM-based FPGAs. Most commercial vendors [76, 126] use static memory (SRAM) based programming technology in their devices. These devices use static memory cells which are divided throughout the FPGA to provide configurability. An example of such memory cell is shown in Fig. 2.2. In an SRAM-based FPGA, SRAM cells are mainly used for following purposes:

- 1. To program the routing interconnect of FPGAs which are generally steered by small multiplexors.
- 2. To program Configurable Logic Blocks (CLBs) that are used to implement logic functions.

SRAM-based programming technology has become the dominant approach for FPGAs because of its re-programmability and the use of standard CMOS process technology and therefore leading to increased integration, higher speed and lower

#### Fig. 2.2 Static memory cell



dynamic power consumption of new process with smaller geometry. There are however a number of drawbacks associated with SRAM-based programming technology. For example an SRAM cell requires 6 transistors which makes the use of this technology costly in terms of area compared to other programming technologies. Further SRAM cells are volatile in nature and external devices are required to permanently store the configuration data. These external devices add to the cost and area overhead of SRAM-based FPGAs.

### 2.2.2 Flash Programming Technology

One alternative to the SRAM-based programming technology is the use of flash or EEPROM based programming technology. Flash-based programming technology offers several advantages. For example, this programming technology is nonvolatile in nature. Flash-based programming technology is also more area efficient than SRAM-based programming technology. Flash-based programming technology has its own disadvantages also. Unlike SRAM-based programming technology, flashbased devices can not be reconfigured/reprogrammed an infinite number of times. Also, flash-based technology uses non-standard CMOS process.

## 2.2.3 Anti-fuse Programming Technology

An alternative to SRAM and flash-based technologies is anti-fuse programming technology. The primary advantage of anti-fuse programming technology is its low area. Also this technology has lower on resistance and parasitic capacitance than other two programming technologies. Further, this technology is non-volatile in nature. There are however significant disadvantages associated with this programming technology. For example, this technology does not make use of standard CMOS process. Also, anti-fuse programming technology based devices can not be reprogrammed.

In this section, an overview of three commonly used programming technologies is given where all of them have their advantages and disadvantages. Ideally, one would like to have a programming technology which is reprogrammable, non-volatile, and that uses a standard CMOS process. Apparently, none of the above presented technologies satisfy these conditions. However, SRAM-based programming technology is the most widely used programming technology. The main reason is its use of standard CMOS process and for this very reason, it is expected that this technology will continue to dominate the other two programming technologies.

### 2.3 Configurable Logic Block

A configurable logic block (CLB) is a basic component of an FPGA that provides the basic logic and storage functionality for a target application design. In order to provide the basic logic and storage capability, the basic component can be either a transistor or an entire processor. However, these are the two extremes where at one end the basic component is very fine-grained (in case of transistors) and requires large amount of programmable interconnect which eventually results in an FPGA that suffers from area-inefficiency, low performance and high power consumption. On the other end (in case of processor), the basic logic block is very coarse-grained and can not be used to implement small functions as it will lead to wastage of resources. In between these two extremes, there exists a spectrum of basic logic blocks. Some of them include logic blocks that are made of NAND gates [101], an interconnection of multiplexors [44], lookup table (LUT) [121] and PAL style wide input gates [124]. Commercial vendors like Xilinx and Altera use LUT-based CLBs to provide basic logic and storage functionality. LUT-based CLBs provide a good trade-off between too fine-grained and too coarse-grained logic blocks. A CLB can comprise of a single basic logic element (BLE), or a cluster of locally interconnected BLEs (as shown in Fig. 2.4). A simple BLE consists of a LUT, and a Flip-Flop. A LUT with k inputs (LUT-k) contains  $2^k$  configuration bits and it can implement any k-input boolean function. Figure 2.3 shows a simple BLE comprising of a 4 input LUT (LUT-4) and a D-type Flip-Flop. The LUT-4 uses 16 SRAM bits to implement any 4 inputs boolean function. The output of LUT-4 is connected to an optional Flip-Flop. A multiplexor selects the BLE output to be either the output of a Flip-Flop or the LUT-4.

A CLB can contain a cluster of BLEs connected through a local routing network. Figure 2.4 shows a cluster of 4 BLEs; each BLE contains a LUT-4 and a Flip-Flop. The BLE output is accessible to other BLEs of the same cluster through a local routing network. The number of output pins of a cluster are equal to the total number of BLEs in a cluster (with each BLE having a single output). However, the number of input pins of a cluster can be less than or equal to the sum of input pins required



Fig. 2.3 Basic logic element (BLE) [22]

by all the BLEs in the cluster. Modern FPGAs contain typically 4 to 10 BLEs in a single cluster. Although here we have discussed only basic logic blocks, many modern FPGAs contain a heterogeneous mixture of blocks, some of which can only be used for specific purposes. Theses specific purpose blocks, also referred here as hard blocks, include memory, multipliers, adders and DSP blocks etc. Hard blocks are very efficient at implementing specific functions as they are designed optimally to perform these functions, yet they end up wasting huge amount of logic and routing resources if unused. A detailed discussion on the use of heterogeneous mixture of blocks for implementing digital circuits is presented in Chap. 4 where both advantages and disadvantages of heterogeneous FPGA architectures and a remedy to counter the resource loss problem are discussed in detail.

### 2.4 FPGA Routing Architectures

As discussed earlier, in an FPGA, the computing functionality is provided by its programmable logic blocks and these blocks connect to each other through programmable routing network. This programmable routing network provides routing

6

#### 2.4 FPGA Routing Architectures

Fig. 2.4 A configurable logic block (CLB) having four BLEs [22]



connections among logic blocks and I/O blocks to implement any user-defined circuit. The routing interconnect of an FPGA consists of wires and programmable switches that form the required connection. These programmable switches are configured using the programmable technology.

Since FPGA architectures claim to be potential candidate for the implementation of any digital circuit, their routing interconnect must be very flexible so that they can accommodate a wide variety of circuits with widely varying routing demands. Although the routing requirements vary from circuit to circuit, certain common characteristics of these circuits can be used to optimally design the routing interconnect of FPGA architecture. For example most of the designs exhibit locality, hence requiring abundant short wires. But at the same time there are some distant connections, which leads to the need for sparse long wires. So, care needs to be taken into account while designing routing interconnect for FPGA architectures where we have to address both flexibility and efficiency. The arrangement of routing resources, relative to the arrangement of logic blocks of the architecture, plays a very important role in the overall efficiency of the architecture. This arrangement is termed here as global routing architecture whereas the microscopic details regarding the switching topology of different switch blocks is termed as detailed routing architecture. On the basis of the global arrangement of routing resources of the architecture, FPGA architectures can be categorized as either hierarchical [4] or island-style [22]. In this section, we present a detailed overview of both routing architectures.



Fig. 2.5 Overview of mesh-based FPGA architecture [22]

### 2.4.1 Island-Style Routing Architecture

Figure 2.5 shows a traditional island-style FPGA architecture (also termed as meshbased FPGA architecture). This is the most commonly used architecture among academic and commercial FPGAs. It is called island-style architecture because in this architecture configurable logic blocks look like islands in a sea of routing interconnect. In this architecture, configurable logic blocks (CLBs) are arranged on a 2D grid and are interconnected by a programmable routing network. The Input/Output (I/O) blocks on the periphery of FPGA chip are also connected to the programmable routing network. The routing network comprises of pre-fabricated wiring segments and programmable switches that are organized in horizontal and vertical routing channels.

The routing network of an FPGA occupies 80–90% of total area, whereas the logic area occupies only 10–20% area [22]. The flexibility of an FPGA is mainly dependent on its programmable routing network. A mesh-based FPGA routing network consists of horizontal and vertical routing tracks which are interconnected through switch boxes (SB). Logic blocks are connected to the routing network through connection boxes (CB). The flexibility of a connection box (Fc) is the number of routing tracks of adjacent channel which are connected to the pin of a block. The connectivity of input pins of logic blocks with the adjacent routing channel is called as Fc(in); the connectivity of output pins of the logic blocks with the adjacent routing channel is called as Fc(out). An Fc(in) equal to 1.0 means that all the tracks of adjacent routing channel are connected to the input pin of the logic block. The flexibility of switch box (Fs) is the total number of tracks with which every track entering in the switch



box connects to. The number of tracks in routing channel is called the channel width of the architecture. Same channel width is used for all horizontal and vertical routing channels of the architecture. An example explaining the switch box, connection box flexibilities, and routing channel width is shown in Fig. 2.6. In this figure switch box has Fs = 3 as each track incident on it is connected to 3 tracks of adjacent routing channels. Similarly, connection box has Fc(in) = 0.5 as each input of the logic block is connected to 50% of the tracks of adjacent routing channel.

The routing tracks connected through a switch box can be bidirectional or unidirectional (also called as directional) tracks. Figure 2.7 shows a bidirectional and a unidirectional switch box having Fs equal to 3. The input tracks (or wires) in both these switch boxes connect to 3 other tracks of the same switch box. The only limitation of unidirectional switch box is that their routing channel width must be in multiples of 2.

Generally, the output pins of a block can connect to any routing track through pass transistors. Each pass transistor forms a tristate output that can be independently turned on or off. However, single-driver wiring technique can also be used to connect output pins of a block to the adjacent routing tracks. For single-driver wiring, tristate elements cannot be used; the output of block needs to be connected to the neighboring routing network through multiplexors in the switch box. Modern commercial FPGA architectures have moved towards using single-driver, directional routing tracks. Authors in [51] show that if single-driver directional wiring is used instead of bidirectional wiring, 25% improvement in area, 9% in delay and 32% in area-delay can be achieved. All these advantages are achieved without making any major changes in the FPGA CAD flow.

In mesh-based FPGAs, multi-length wires are created to reduce delay. Figure 2.8 shows an example of different length wires. Longer wire segments span multiple blocks and require fewer switches, thereby reducing routing area and delay. However, they also decrease routing flexibility, which reduces the probability to route a hardware circuit successfully. Modern commercial FPGAs commonly use a combination of long and short wires to balance flexibility, area and delay of the routing network.



Fig. 2.8 Channel segment distribution

#### 2.4.1.1 Altera's Stratix II Architecture

Until now, we have presented a general overview about island-style routing architecture. Now we present a commercial example of this kind of architectures. Altera's Stratix II [106] architecture is an industrial example of an island-style FPGA (Fig. 2.9). The logic structure consists of LABs (Logic Array Blocks), memory blocks, and digital signal processing (DSP) blocks. LABs are used to

# www.Jntufastupdates.com 10



Fig. 2.9 Altera's stratix-II block diagram

implement general-purpose logic, and are symmetrically distributed in rows and columns throughout the device fabric. The DSP blocks are custom designed to implement full-precision multipliers of different granularities, and are grouped into columns. Input- and output-only elements (IOEs) represent the external interface of the device. IOEs are located along the periphery of the device.

Each Stratix II LAB consists of eight Adaptive Logic Modules (ALMs). An ALM consists of 2 adaptive LUTs (ALUTs) with eight inputs altogether. Construction of an ALM allows implementation of 2 separate 4-input Boolean functions. Further, an ALM can also be used to implement any six-input Boolean function, and some seven-input functions. In addition to lookup tables, an ALM provides 2 programmable registers, 2 dedicated full-adders, a carry chain, and a register-chain. Full-adders and carry chain can be used to implement arithmetic operations, and the register-chain is used to build shift registers. Outputs of an ALM drive all types of interconnect provided by the Stratix II device. Figure 2.10 illustrates a LAB interconnect interface.

Interconnections between LABs, RAM blocks, DSP blocks and the IOEs are established using the Multi-track interconnect structure. This interconnect structure consists of wire segments of different lengths and speeds. The interconnect wire-segments span fixed distances, and run in the horizontal (row interconnects) and vertical (column interconnects) directions. The row interconnects (Fig. 2.11) can be used to route signals between LABs, DSP blocks, and memory blocks in the same row. Row interconnect resources are of the following types:



Fig. 2.10 Stratix-II logic array block (LAB) structure

- Direct connections between LABs and adjacent blocks.
- R4 resources that span 4 blocks to the left or right.
- R24 resources that provide high-speed access across 24 columns.

Each LAB owns its set of R4 interconnects. A LAB has approximately equal numbers of driven-left and driven-right R4 interconnects. An R4 interconnect that is driven to the left can be driven by either the primary LAB (Fig. 2.11) or the adjacent LAB to the left.

Similarly, a driven-right R4 interconnect may be driven by the primary LAB or the LAB immediately to its right. Multiple R4 resources can be connected to each other to establish longer connections within the same row. R4 interconnects can also drive C4 and C16 column interconnects, and R24 high speed row resources.

Column interconnect structure is similar to row interconnect structure. Column interconnects include:

- Carry chain interconnects within a LAB, and from LAB to LAB in the same column.
- Register chain interconnects.
- C4 resources that span 4 blocks in the up and down directions.
- C16 resources for high-speed vertical routing across 16 rows.

Carry chain and register chain interconnects are separated from local interconnect (Fig. 2.10) in a LAB. Each LAB has its own set of driven-up and driven-down C4 interconnects. C4 interconnects can also be driven by the LABs that are immediately



Fig. 2.11 R4 interconnect connections

adjacent to the primary LAB. Multiple C4 resources can be connected to each other to form longer connections within a column, and C4 interconnects can also drive row interconnects to establish column-to-column interconnections. C16 interconnects are high-speed vertical resources that span 16 LABs. A C16 interconnect can drive row and column interconnects at every fourth LAB. A LAB local interconnect structure cannot be directly driven by a C16 interconnect; only C4 and R4 interconnects can drive a LAB local interconnect structure. Figure 2.12 shows the C4 interconnect structure in the Stratix II device.

## 2.4.2 Hierarchical Routing Architecture

Most logic designs exhibit locality of connections; hence implying a hierarchy in placement and routing of connections between different logic blocks. Hierarchical routing architectures exploit this locality by dividing FPGA logic blocks into separate groups/clusters. These clusters are recursively connected to form a hierarchical structure. In a hierarchical architecture (also termed as tree-based architecture), connections between logic blocks within same cluster are made by wire segments at the lowest level of hierarchy. However, the connection between blocks residing in different groups require the traversal of one or more levels of hierarchy. In a hierarchical architecture, the signal bandwidth varies as we move away from the bottom level and generally it is widest at the top level of hierarchy. The hierarchical routing architecture has been used in a number of commercial FPGA families including Altera Flex10K [10], Apex [15] and ApexII [16] architectures. We assume that Multilevel hierarchical interconnect regroups architectures with more than 2 levels of hierarchy and Tree-based ones.



Fig. 2.12 C4 interconnect connections

#### 2.4.2.1 HFPGA: Hierarchical FPGA

In the hierarchical FPGA called HFPGA, LBs are grouped into clusters. Clusters are then grouped recursively together (see Fig. 2.13). The clustered VPR mesh architecture [22] has a Hierarchical topology with only two levels. Here we consider multilevel hierarchical architectures with more than 2 levels. In [1] and [129] various hierarchical structures were discussed. The HFPGA routability depends on switch boxes topologies. HFPGAs comprising fully populated switch boxes ensure 100% routability but are very penalizing in terms of area. In [129] authors explored the HFPGA architecture, investigating how the switch pattern can be partly depopulated while maintaining a good routability.



Fig. 2.13 Hierarchical FPGA topology

#### 2.4.2.2 HSRA: Hierarchical Synchronous Reconfigurable Array

An example of an academic hierarchical routing architecture is shown in Fig. 2.14. It has a strictly hierarchical, tree-based interconnect structure. In this architecture, the only wire segments that directly connect to the logic units are located at the leaves of the interconnect tree. All other wire segments are decoupled from the logic structure. A logic block of this architecture consists of a pair of 2-input Look Up Table (2-LUT) and a D-type Flip Flop (D-FF). The input-pin connectivity is based on a choose-k strategy [4], and the output pins are fully connected. The richness of this interconnect structure is defined by its base channel width *c* and interconnect growth rate *p*. The base channel width *c* is defined as the number of tracks at the leaves of the interconnect bandwidth grows towards the upper levels. The interconnect growth rate can be realized either using non-compressing or compressing switch blocks. The details regarding these switch blocks is as follows:

- Non-compressing (2:1) switch blocks—The number of tracks at the upper level are equal to the sum of the number of tracks of the children at lower level. For example, in Fig. 2.14, non-compressing switch blocks are used between levels 1, 2 and levels 3, 4.
- Compressing (1:1) switch blocks—The number of tracks at the upper level are equal to the number of tracks of either child at the lower level. For example, in Fig. 2.14, compressing switch blocks are used between levels 2 and 3.

A repeating combination of non-compressing and compressing switch blocks can be used to realize any value of p less than one. For example, a repeating pattern of (2:1, 1:1) switch blocks realizes p = 0.5, while the pattern (2:1, 2:1, 1:1) realizes p = 0.67. An architecture that has only 2:1 switch blocks provides a growth rate of p = 1.

Another hierarchical routing architecture is presented in [132] where the global routing architecture (i.e. the position of routing resources relative to logic resources



Fig. 2.14 Example of hierarchical routing architecture [4]

of the architecture) remains the same as in [4]. However, there are several key differences at the level of detailed routing architecture (i.e. the way the routing resources are connected to each other, flexibility of switch blocks etc.) that separate the two architectures. For example the architecture shown in Fig. 2.14 has one bidirectional interconnect that uses bidirectional switches and it supports only arity-2 (i.e. each cluster can contain only two sub-clusters). On contrary, the architecture presented in [132] supports two separate unidirectional interconnect networks: one is downward interconnect whereas other is upward interconnect network. Further this architecture is more flexible as it can support logic blocks with different sizes and also the clusters/groups of the routing architecture can have different arity sizes. Further details of this architecture, from now on alternatively termed as tree-based architecture, are presented in next chapter.





#### 2.4.2.3 APEX: Altera

*APEX* architecture is a commercial product from Altera Corporation which includes 3 levels of interconnect hierarchy. Figure 2.15 shows a diagram of the APEX 20K400 programmable logic device. The basic logic-element (LE) is a 4-input LUT and DFF pair. Groups of 10 LEs are grouped into a logic-array-block or LAB. Interconnect within a LAB is complete, meaning that a connection from the output of any LE to the input of another LE in its LAB always exists, and any signal entering the input region can reach every LE.

Groups of 16 LABs form a MegaLab. Interconnect within a MegaLab requires an LE to drive a GH (MegaLab global H) line, a horizontal line, which switches into the input region of any other LAB in the same MegaLab. Adjacent LABs have the ability to interleave their input regions, so an LE in  $LAB_i$  can usually drive  $LAB_{i+1}$  without using a GH line. A 20K400 MegaLab contains 279 GH lines.

The top-level architecture is a 4 by 26 array of MegaLabs. Communication between MegaLabs is accomplished by global H (horizontal) and V (vertical) wires, that switch at their intersection points. The H and V lines are segmented by a bidirectional segmentation buffer at the horizontal and vertical centers of the chip. In Fig. 2.15, We denote the use of a single (half-chip) line as H or V and a double or full-chip line through the segmentation buffer as HH or VV. The 20K400 contains 100 H lines per MegaLab row, and 80 V lines per LAB-column.

In this section, so far we have given an overview of the two routing architectures that are commonly employed in FPGAs. Both architectures have their positive and negative points. For example, hierarchical routing architectures exploit the



locality exhibited by the most of the designs and in turn offer smaller delays and more predictable routing compared to island-style architectures. The speed of a net is determined by the number of routing switches it has to pass and the length of wires. In a mesh-based architecture, the number of segments increase linearly with manhattan distance d between the logic blocks to be connected. However, for treebased architecture the distance d between the blocks to be connected increases in a logarithmic manner [82]. This fact is illustrated in Fig. 2.16. On the other hand, scalability is an issue in hierarchical routing architectures and there might be some design mapping issues. But in the case of mesh-based architecture, there are no such issues as it offers a tile-based layout where a tile once formed can be replicated horizontally and vertically to make as large architecture as we wish.

## 2.5 Software Flow

FPGA architectures have been intensely investigated over the past two decades. A major aspect of FPGA architecture research is the development of Computer Aided Design (CAD) tools for mapping applications to FPGAs. It is well established that the quality of an FPGA-based implementation is largely determined by the effectiveness of accompanying suite of CAD tools. Benefits of an otherwise well designed, feature rich FPGA architecture might be impaired if the CAD tools cannot take advantage of the features that the FPGA provides. Thus, CAD algorithm research is essential to the necessary architectural advancement to narrow the performance gaps between FPGAs and other computational devices like ASICs.

The software flow (CAD flow) takes an application design description in a Hardware Description Language (HDL) and converts it to a stream of bits that is eventually programmed on the FPGA. The process of converting a circuit description into a format that can be loaded into an FPGA can be roughly divided into five distinct steps, namely: synthesis, technology mapping, mapping, placement and routing. The final output of FPGA CAD tools is a bitstream that configures the state of the memory

#### 2.5 Software Flow

Fig. 2.17 FPGA software flow



bits in an FPGA. The state of these bits determines the logical function that the FPGA implements. Figure 2.17 shows a generalized software flow for programming an application circuit on an FPGA architecture. A description of various modules of software flow is given in the following part of this section. The details of these modules are generally indifferent to the kind of routing architecture being used and they are applicable to both architectures described earlier unless otherwise specified.

#### 2.5.1 Logic Synthesis

The flow of FPGA starts with the logic synthesis of the netlist being mapped on it. Logic synthesis [26, 27] transforms an HDL description (VHDL or Verilog) into a set of boolean gates and Flip-Flops. The synthesis tools transform the

# www.Jntufastupdates.com 19



Fig. 2.18 Directed acyclic graph representation of a circuit

register-transfer-level (RTL) description of a design into a hierarchical boolean network. Various technology-independent techniques are applied to optimize the boolean network. The typical cost function of technology-independent optimizations is the total literal count of the factored representation of the logic function. The literal count correlates very well with the circuit area. Further details of logic synthesis are beyond the scope of this book.

## 2.5.2 Technology Mapping

The output from synthesis tools is a circuit description of Boolean logic gates, flipflops and wiring connections between these elements. The circuit can also be represented by a Directed Acyclic Graph (DAG). Each node in the graph represents a gate, flip-flop, primary input or primary output. Each edge in the graph represents a connection between two circuit elements. Figure 2.18 shows an example of a DAG representation of a circuit. Given a library of cells, the technology mapping problem can be expressed as finding a network of cells that implements the Boolean network. In the FPGA technology mapping problem, the library of cells is composed of k-input LUTs and flip-flops. Therefore, FPGA technology mapping involves transforming the Boolean network into k-bounded cells. Each cell can then be implemented as an independent k-LUT. Figure 2.19 shows an example of transforming a Boolean network into k-bounded cells. Technology mapping algorithms can optimize a design for a set of objectives including depth, area or power. The FlowMap algorithm [64] is the most widely used academic tool for FPGA technology mapping. FlowMap is a breakthrough in FPGA technology mapping because it is able to find a depth-optimal solution in polynomial time. FlowMap guarantees depth optimality at the expense of logic duplication. Since the introduction of FlowMap, numerous technology mappers have been designed that optimize for area and run-time while still maintaining



Fig. 2.19 Example of technology mapping

the depth-optimality of the circuit [65–67]. The result of the technology mapping step generates a network of k-bounded LUTs and flip-flops.

#### 2.5.3 Clustering/Packing

The logic elements in a Mesh-based FPGA are typically arranged in two levels of hierarchy. The first level consists of logic blocks (LBs) which are k-input LUT and flip-flop pairs. The second level hierarchy groups k LBs together to form logic blocks clusters. The clustering phase of the FPGA CAD flow is the process of forming groups of k LBs. These clusters can then be mapped directly to a logic element on an FPGA. Figure 2.20 shows an example of the clustering process.

Clustering algorithms can be broadly categorized into three general approaches, namely top-down [39, 78], depth-optimal [84, 100] and bottom-up [14, 17, 43]. Top-down approaches partition the LBs into clusters by successively subdividing the network or by iteratively moving LBs between parts. Depth-optimal solutions attempt to minimize delay at the expense of logic duplication. Bottom-up approaches are generally preferred for FPGA CAD tools due to their fast run times and reasonable timing delays. They only consider local connectivity information and can easily satisfy clusters pin constraints. Top-down approaches offer the best solutions; however, their computational complexity can be prohibitive.

#### 2.5.3.1 Bottom-up Approaches

Bottom-up approaches build clusters sequentially one at a time. The process starts by choosing an LB which acts as a cluster seed. LBs are then greedily selected and added to the cluster, applying various attraction functions. The VPack [14] attraction



Fig. 2.20 Example of packing

function is based on the number of shared nets between a candidate LB and the LBs that are already in the cluster. For each cluster, the attraction function is used to select a seed LB from the set of all LBs that have not already been packed. After packing a seed LB into the new cluster, a second attraction function selects new LBs to pack into the cluster. LBs are packed into the cluster until the cluster reaches full capacity or all cluster inputs have been used. If all cluster inputs become occupied before this cluster reaches full capacity, a hill-climbing technique is applied, searching for LBs that do not increase the number of inputs used by the cluster. The VPack pseudo-code is outlined in algorithm 2.1.

T-VPack [22] is a timing-driven version of VPack which gives added weight to grouping LBs on the critical path together. The algorithm is identical to VPack, however, the attraction functions which select the LBs to be packed into the clusters are different. The VPack seed function chooses LBs with the most used inputs, whereas the T-VPack seed function chooses LBs that are on the most critical path. VPack's second attraction function chooses LBs with the largest number of connections with the LBs already packed into the cluster. T-VPack's second attraction function has two components for a LB *B* being considered for cluster *C*:

$$Attraction(B, C) = \alpha.Crit(B) + (1 - \alpha)\frac{|Nets(B) \cap Nets(C)|}{G}$$
(2.1)

where Crit(B) is a measure of how close LB *B* is to being on the critical path, *Nets*(*B*) is the set of nets connected to LB *B*, *Nets*(*C*) is the set of nets connected to the LBs already selected for cluster *C*,  $\alpha$  is a user-defined constant which determines the relative importance of the attraction components, and *G* is a normalizing factor. The first component of T-VPack's second attraction function chooses critical-path LBs, and the second chooses LBs that share many connections with the LBs already packed into the cluster. By initializing and then packing clusters with

```
UnclusteredLBs = PatternMatchToLBs(LUTs,Registers);
LogicClusters = NULL;
while UnclusteredLBs != NULL do
   C = GetLBwithMostUsedInputs(UnclusteredLBs);
   while |C| < k do
      /*cluster is not full*/
      BestLB = MaxAttractionLegalLB(C,UnclusteredLBs);
      if BestLB == NULL then
         /*No LB can be added to this cluster*/
         break:
      endif
      UnclusteredLBs = UnclusteredLB - BestLB;
      C = C \cup BestLB:
   endw
   if |C| < k then
      /*Cluster is not full - try hill climbing*/
      while |C| < k do
         BestLB = MinClusterInputIncreaseLB(C,UnclusteredLBs);
         C = C \cup BestLB;
         UnclusteredLBs = UnclusteredLB - BestLB;
      endw
      if ClusterIsIllegal(C) then
         RestoreToLastLegalState(C,UnclusteredLBs);
      endif
   endif
   LogicClusters = LogicClusters \cup C;
endw
```

Algorithm 2.1 Pseudo-code of the VPack Algorithm [22]

critical-path LBs, the algorithm is able to absorb long sequences of critical-path LBs into clusters. This minimizes circuit delay since the local interconnect within the cluster is significantly faster than the global interconnect of the FPGA. RPack [43] improves routability of a circuit by introducing a new set of routability metrics. RPack significantly reduced the channel widths required by circuits compared to VPack. T-RPack [43] is a timing driven version of RPack which is similar to T-VPack by giving added weight to grouping LBs on the critical path. iRAC [17] improves the routability of circuits even further by using an attraction function that attempts to encapsulate as many low fanout nets as possible within a cluster. If a net can be completely encapsulated within a cluster, there is no need to route that net in the external routing network. By encapsulating as many nets as possible within clusters, routability is improved because there are less external nets to route in total.

## 2.5.3.2 Top-down Approaches

The K-way partitioning problem seeks to minimize a given cost function of such an assignment. A standard cost function is net cut, which is the number of hyperedges that span more than one partition, or more generally, the sum of weights of such hyperedges. Constraints are typically imposed on the solution, and make the problem difficult. For example some vertices can be fixed in their parts or the total vertex weight in each part must be limited (balance constraint and FPGA clusters size). With balance constraints, the problem of partitioning optimally a hypergraph is known to be NP-hard [85]. However, since partitioning is critical in several practical applications, heuristic algorithms were developed with near-linear runtime. Such move-based heuristics for k-way hypergraph partitioning appear in [24, 34, 110].

#### Fiduccia-Mattheyses Algorithm

The Fiduccia-Mattheyses (FM) heuristics [34] work by prioritizing moves by gain. A move changes to which partition a particular vertex belongs, and the gain is the corresponding change of the cost function. After each vertex is moved, gains for connected modules are updated.

| partitioning = initial_solution;                |  |  |  |
|-------------------------------------------------|--|--|--|
| while solution quality improves do              |  |  |  |
| Initialize gain_container from partitioning;    |  |  |  |
| solution_cost = partitioning.get_cost();        |  |  |  |
| while not all vertices locked do                |  |  |  |
| move = choose_move();                           |  |  |  |
| solution_cost += gain_container.get_gain(move); |  |  |  |
| gain_container.lock_vertex(move.vertex());      |  |  |  |
| gain_update(move);                              |  |  |  |
| partitioning.apply(move);                       |  |  |  |
| endw                                            |  |  |  |
| roll back partitioning to best seen solution;   |  |  |  |
| gain_container.unlock_all();                    |  |  |  |
| endw                                            |  |  |  |

Algorithm 2.2 Pseudo-code for FM Heuristic [38]

The Fiduccia-Mattheyses (FM) heuristic for partitioning hypergraphs is an iterative improvement algorithm. FM starts with a possibly random solution and changes the solution by a sequence of moves which are organized as passes. At the beginning of a pass, all vertices are free to move (unlocked), and each possible move is labeled with the immediate change to the cost it would cause; this is called the gain of the move (positive gains reduce solution cost, while negative gains increase it). Iteratively, a move with highest gain is selected and executed, and the moving vertex is locked, i.e., is not allowed to move again during that pass. Since moving a vertex can change gains of adjacent vertices, after a move is executed all affected gains are updated. Selection and execution of a best-gain move, followed by gain update, are repeated until every vertex is locked. Then, the best solution seen during the pass is adopted as the starting solution of the next pass. The algorithm terminates when a



Fig. 2.21 The gain bucket structure as illustrated in [34]

pass fails to improve solution quality. Pseudo-code for the FM heuristic is given in algorithm 2.2.

The FM algorithm has 3 main components (1) computation of initial gain values at the beginning of a pass; (2) the retrieval of the best-gain (feasible) move; and (3) the update of all affected gain values after a move is made. One contribution of Fiduccia and Mattheyses lies in observing that circuit hypergraphs are sparse, and any move's gain is bounded between plus and minus the maximal vertex degree  $G_{max}$  in the hypergraph (times the maximal hyperedge weight, if weights are used). This allows prioritizing moves by their gains. All affected gains can be updated in amortized-constant time, giving overall linear complexity per pass [34]. All moves with the same gain are stored in a linked list representing a "gain bucket". Figure. 2.21 presents the gain bucket list structure. It is important to note that some gains *G* may be negative, and as such, FM performs hill-climbing and is not strictly greedy.

#### Multilevel Partitioning

The multilevel hypergraph partitioning framework was successfully verified by [31, 48, 49] and leads to the best known partitioning results ever since. The main advantage of multilevel partitioning over flat partitioners is its ability to search the solution space more effectively by spending comparatively more effort on smaller coarsened hypergraphs. Good coarsening algorithms allow for high correlation between good partitioning for coarsened hypergraphs and good partitioning for the initial hypergraph. Therefore, a thorough search at the top of the multilevel hierarchy is worthwhile because it is relatively inexpensive when compared to flat partitioning of the original hypergraph, but can still preserve most of the possible improvement.

25

The result is an algorithmic framework with both improved runtime and solution quality over a completely flat approach. Pseudo-code for an implementation of the multilevel partitioning framework is given in algorithm 2.3.

```
level = 0;
hierarchy[level] = hypergraph;
min_vertices = 200;
while hierarchy[level].vertex_count() > min_vertices do
    next_level = cluster(hierarchy[level]);
    level = level + 1;
    hierarchy[level] = next_level;
endw
partitioning[level] = a random initial solution for top-level hypergraph;
FM(hierarchy[level], partitioning[level]);
while level>0 do
    level = level - 1;
    partitioning[level] = project(partitioning[level+1], hierarchy[level]);
FM(hierarchy[level], partitioning[level]);
endw
```

Algorithm 2.3 Pseudo-code for the Multilevel Partitioning Algorithm [38]

As illustrated in Fig. 2.22, multilevel partitioning consists of 3 main components: clustering, top-level partitioning and refinement or "uncoarsening". During clustering, hypergraph vertices are combined into clusters based on connectivity, leading to a smaller, clustered hypergraph. This step is repeated until obtaining only several hundred clusters and a hierarchy of clustered hypergraphs. We describe this hierarchy, as shown in Fig. 2.22, with the smaller hypergraphs being "higher" and the larger hypergraphs being "lower". The smallest (top-level) hypergraph is partitioned with a very fast initial solution generator and improved iteratively, for example, using the FM algorithm. The resulting partitioning is then interpreted as a solution for the next hypergraph in the hierarchy. During the refinement stage, solutions are projected from one level to the next and improved iteratively. Additionally, the hMETIS partitioning program [49] introduced several new heuristics that are incorporated into their multilevel partitioning implementation and are reportedly performance critical.

# 2.5.4 Placement

Placement algorithms determine which logic block within an FPGA should implement the corresponding logic block (instance) required by the circuit. The optimization goals consist in placing connected logic blocks close together to minimize the required wiring (wire length-driven placement), and sometimes to place blocks to balance the wiring density across the FPGA (routability-driven placement) or to maximize circuit speed (timing-driven placement). The 3 major classes of

26



Fig. 2.22 Multilevel hypergraph bisection

placers in use today are min-cut (Partitioning-based) [6, 40], analytic [32, 53] which are often followed by local iterative improvement, and simulated annealing based placers [37, 105]. To investigate architectures fairly we must make sure that our CAD tools are attempting to use every FPGA's feature. This means that the optimization approach and goals of the placer may change from architecture to architecture. Partitioning and simulated annealing approaches are the most common and used in FPGA CAD tools. Thus we focus on both techniques in the sequel.

#### 2.5.4.1 Simulated Annealing Based Approach

Simulated annealing mimics the annealing process used to cool gradually molten metal to produce high-quality metal objects [105]. Pseudo-code for a generic simulated annealing-based placer is shown in algorithm 2.4. A cost function is used to evaluate the quality of a given placement of logic blocks. For example, a common cost function in wirelength-driven placement is the sum over all nets of the half perimeter of their bounding boxes. An initial placement is created by assigning logic blocks randomly to the available locations in the FPGA. A large number of moves, or local improvements are then made to gradually improve the placement. A logic block is selected at random, and a new location for it is also selected randomly. The change in cost function that results from moving the selected logic block to the proposed new location is computed. If the cost decreases, the move is always accepted and the block is moved. If the cost increases, there is still a chance to accept the move, even though it makes the placement worse. This probability of acceptance is

```
S = \text{RandomPlacement()};
T = \text{InitialTemperature()};
R_{limit} = Initial R_{limit};
while ExitCriterion() == false do
while InnerLoopCriterion() == false do
S_{new} = GenerateViaMove(S, R_{limit});
\Delta C = Cost(S_{new}) - Cost(S);
r = random(0,1);
if r < e^{-\frac{\Delta C}{T}} then
S = S_{new};
endif
endw
T = \text{UpdateTemp()};
R_{limit} = UpdateR_{limit}();
endw
```

Algorithm 2.4 Generic Simulated Annealing-based Placer [22]

given by  $e^{-\frac{\Delta C}{T}}$ , where  $\Delta C$  is the change in cost function, and *T* is a parameter called temperature that controls probability of accepting moves that worsen the placement. Initially, *T* is high enough so almost all moves are accepted; it is gradually decreased as the placement improves, in such a way that eventually the probability of accepting a worsening move is very low. This ability to accept hill-climbing moves that make a placement worse allows simulated annealing to escape local minima of the cost function.

The  $R_{limit}$  parameter in algorithm 2.4 controls how close are together blocks must be to be considered for swapping. Initially,  $R_{limit}$  is fairly large, and swaps of blocks far apart on a chip are more likely. Throughout the annealing process,  $R_{limit}$ is adjusted to try to keep the fraction of accepted moves at any temperature close to 0.44. If the fraction of moves accepted,  $\alpha$ , is less than 0.44,  $R_{limit}$  is reduced, while if  $\alpha$  is greater than 0.44,  $R_{limit}$  is increased.

In [22], the objective cost function is a function of the total wirelength of the current placement. The wirelength is an estimate of the routing resources needed to completely route all nets in the netlist. Reductions in wirelength mean fewer routing wires and switches are required to route nets. This point is important because routing resources in an FPGA are limited. Fewer routing wires and switches typically are also translated into reductions of the delay incurred in routing nets between logic blocks. The total wirelength of a placement is estimated using a semi-perimeter metric, and is given by Eq. 2.2. N is the total number of nets in the netlist, bbx(i) is the horizontal span of net i, bby(i) is its vertical span, and q(i) is a correction factor. Figure 2.23 illustrates the calculation of the horizontal and vertical spans of a hypothetical net that has 6 terminals.

$$WireCost = \sum_{i=1}^{N} q(i) \times (bb_x(i) + bb_y(i))$$
(2.2)



The temperature decrease rate, the exit criterion for terminating the anneal, the number of moves attempted at each temperature (InnerLoopCriterion), and the method by which potential moves are generated are defined by the annealing schedule. An efficient annealing schedule is crucial to obtain good results in a reasonable amount of CPU time. Many proposed annealing schedules are "fixed" schedules with no ability to adapt to different problems. Such schedules can work well within the narrow application range for which they are developed, but their lack of adaptability means they are not very general. In [86] authors propose an "adaptive" annealing schedule based on statistics computed during the anneal itself. Adaptive schedules are widely used to solve large scale optimization problems with many variables.

#### 2.5.4.2 Partitioning Based Approach

Partitioning-based placement methods, are based on graph partitioning algorithms such as the Fiduccia-Mattheyses (FM) algorithm [34], and Kernighan Lin (KL) algorithm [6]. Partitioning-based placement are suitable to Tree-based FPGA architectures. The partitioner is applied recursively to each hierarchical level to distribute netlist cells between clusters. The aim is to reduce external communications and to collect highly connected cells into the same cluster.

The partitioning-based placement is also used in the case of Mesh-based FPGA. The device is divided into two parts, and a circuit partitioning algorithm is applied to determine the adequate part where a given logic block must be placed to minimize the number of cuts in the nets that connect the blocks between partitions, while leaving highly-connected blocks in one partition.

29

A divide-and-conquer strategy is used in these heuristics. By partitioning the problem into sub-parts, a drastic reduction in search space can be achieved. On the whole, these algorithms perform in the top-down manner, placing blocks in the general regions which they should belong to. In the Mesh FPGA case, partitioning-based placement algorithms are good from a "global" perspective, but they do not actually attempt to minimize wirelength. Therefore, the solutions obtained are sub-optimal in terms of wirelength. However, these classes of algorithms run very fast. They are normally used in conjunction with other search techniques for further quality improvement. Some algorithms [130] and [95] combine multi-level clustering and hierarchical simulated annealing to obtain ultra-fast placement with good quality. In the following chapters, the partitioning-based placement approach will be used only for Tree-based FPGA architectures.

#### 2.5.5 Routing

The FPGA routing problem consists in assigning nets to routing resources such that no routing resource is shared by more than one net. *Pathfinder* [80] is the current, state-of-the-art FPGA routing algorithm. *Pathfinder* operates on a directed graph abstraction G(V, E) of the routing resources in an FPGA. The set of vertices V in the graph represents the IO terminals of logic blocks and the routing wires in the interconnect structure. An edge between two vertices represents a potential connection between them. Figure 2.24 presents a part of a routing graph in a Meshbased interconnect.

Given this graph abstraction, the routing problem for a given net is to find a directed tree embedded in G that connects the source terminal of the net to each of its sink terminals. Since the number of routing resources in an FPGA is limited, the goal of finding unique, non-intersecting trees for all the nets in a netlist is a difficult problem.

*Pathfinder* uses an iterative, negotiation-based approach to successfully route all the nets in a netlist. During the first routing iteration, nets are freely routed without paying attention to resource sharing. Individual nets are routed using *Dijkstra*'s shortest path algorithm [111]. At the end of the first iteration, resources may be congested because multiple nets have used them. During subsequent iterations, the cost of using a resource is increased, based on the number of nets that share the resource, and the history of congestion on that resource. Thus, nets are made to negotiate for routing resources. If a resource is highly congested, nets which can use lower congestion alternatives are forced to do so. On the other hand, if the alternatives are more congested than the resource, then a net may still use that resource.

The cost of using a routing resource *n* during a routing iteration is given by Eq. 2.3.

$$c_n = (b_n + h_n) \times p_n \tag{2.3}$$



Fig. 2.24 Modeling FPGA architecture as a directed graph [22]

 $b_n$  is the base cost of using the resource n,  $h_n$  is related to the history of congestion during previous iterations, and  $p_n$  is proportional to the number of nets sharing the resource in the current iteration. The  $p_n$  term represents the cost of using a shared resource n, and the  $h_n$  term represents the cost of using a resource that has been shared during earlier routing iterations. The latter term is based on the intuition that a historically congested node should appear expensive, even if it is slightly shared currently. Cost functions and routing schedule were described in details in [22]. The Pseudo-code of the *Pathfinder* routing algorithm is presented in algorithm 2.5.

```
Let: RT_i be the set of nodes in the current routing of net i
while shared resources exist do
   /*Illegal routing*/
   foreach net, i do
       rip-up routing tree RT_i;
       RT(i) = s_i;
       foreach sink t<sub>ii</sub> do
           Initialize priority queue PQ to RT_i at cost 0;
          while sink t<sub>ii</sub> not found do
              Remove lowest cost node m from PQ;
              foreach fanout node n of node m do
                  Add n to PQ at PathCost(n) = c_n + PathCost(m);
              endfch
          endw
          foreach node n in path t_{ii} to s_i do
              /*backtrace*/
              Update c_n;
              Add n to RT_i;
          endfch
       endfch
   endfch
   update h_n for all n;
endw
```

Algorithm 2.5 Pseudo-code of the Pathfinder Routing Algorithm [80]

An important measure of routing quality produced by an FPGA routing algorithm is the critical path delay. The critical path delay of a routed netlist is the maximum delay of any combinational path in the netlist. The maximum frequency at which a netlist can be clocked has an inverse relationship with critical path delay. Thus, larger critical path delays slow down the operation of netlist. Delay information is incorporated into *Pathfinder* by redefining the cost of using a resource n (Eq. 2.4).

$$c_n = A_{ij} \times d_n + (1 - A_{ij}) \times (b_n + h_n) \times p_n \tag{2.4}$$

The  $c_n$  term is from Eq. 2.3,  $d_n$  is the delay incurred in using the resource, and  $A_{ij}$  is the criticality given by Eq. 2.5.

$$A_{ij} = \frac{D_{ij}}{D_{max}} \tag{2.5}$$

 $D_{ij}$  is the maximum delay of any combinational path going through the source and sink terminals of the net being routed, and  $D_{max}$  is the critical path delay of the netlist. Equation 2.4 is formulated as a sum of two cost terms. The first term in the equation represents the delay cost of using resource *n*, while the second term represents the congestion cost. When a net is routed, the value of  $A_{ij}$  determines whether the delay or the congestion cost of a resource dominates. If a net is near critical (i.e. its  $A_{ij}$  is close to 1), then congestion is largely ignored and the cost of using a resource is primarily determined by the delay term. If the criticality of a net is low, the congestion term in Eq. 2.4 dominates, and the route found for the net avoids congestion while potentially incurring delay.

*Pathfinder* has proved to be one of the most powerful FPGA routing algorithms to date. The negotiation-based framework that trades off delay for congestion is an extremely effective technique for routing signals on FPGAs. More importantly, *Pathfinder* is a truly architecture-adaptive routing algorithm. The algorithm operates on a directed graph abstraction of an FPGA's routing structure, and can thus be used to route netlists on any FPGA that can be represented as a directed routing graph.

#### 2.5.6 Timing Analysis

Timing analysis [99] is used for two basic purposes:

- To determine the speed of circuits which have been completely placed and routed,
- To estimate the slack [68] of each source-sink connection during routing (placement and other parts of the CAD flow) in order to decide which connections must be made via fast paths to avoid slowing down the circuit.

First the circuit under consideration is presented as a directed graph. Nodes in the graph represent input and output pins of circuit elements such as LUTs, registers,

and I/O pads. Connections between these nodes are modeled with edges in the graph. Edges are added between the inputs of combinational logic Blocks (LUTs) and their outputs. These edges are annotated with a delay corresponding to the physical delay between the nodes. Register input pins are not joined to register output pins. To determine the delay of the circuit, a breadth first traversal is performed on the graph starting at sources (input pads, and register outputs). Then the arrival time,  $T_{arrival}$ , at all nodes in the circuit is computed with the following equation:

$$T_{arrival}(i) = \max_{j \in fanin(i)} \{T_{arrival}(j) + delay(j, i)\}$$

where node *i* is the node currently being computed, and delay(j, i) is the delay value of the edge joining node *j* to node *i*. The delay of the circuit is then the maximum arrival time,  $D_{max}$ , of all nodes in the circuit.

To guide a placement or routing algorithm, it is useful to know how much delay may be added to a connection before the path that the connection is on becomes critical. The amount of delay that may be added to a connection before it becomes critical is called the slack of that connection. To compute the slack of a connection, one must compute the required arrival time,  $T_{required}$ , at every node in the circuit. We first set the  $T_{required}$  at all sinks (output pads and register inputs) to be  $D_{max}$ . Required arrival time is then propagated backwards starting from the sinks with the following equation:

$$T_{required}(i) = \min_{j \in fanout(i)} \{T_{required}(j) - delay(j, i)\}$$

Finally, the slack of a connection (i, j) driving node, j, is defined as:

$$Slack(i, j) = T_{required}(j) - T_{arrival}(i) - delay(i, j)$$

#### 2.5.7 Bitstream Generation

Once a netlist is placed and routed on an FPGA, bitstream information is generated for the netlist. This bitstream is programmed on the FPGA using a bitstream loader. The bitstream of a netlist contains information as to which SRAM bit of an FPGA be programmed to 0 or to 1. The bitstream generator reads the technology mapping, packing and placement information to program the SRAM bits of Look-Up Tables. The routing information of a netlist is used to correctly program the SRAM bits of connection boxes and switch boxes.

#### 2.6 Research Trends in Reconfigurable Architectures

Until now in this chapter a detailed overview of logic architecture, routing architecture and software flow of FPGAs is presented. In this section, we highlight some of the disadvantages associated with FPGAs and further we describe some of the trends that are currently being followed to remedy these disadvantages. FPGA-based products are basically very effective for low to medium volume production as they are easy to program and debug, and have less NRE cost and faster time-to-market. All these major advantages of an FPGA come through their reconfigurability which makes them general purpose and field programmable. But, the very same reconfigurability is the major cause of its disadvantages; thus making it larger, slower and more power consuming than ASICs.

However, the continued scaling of CMOS and increased integration has resulted in a number of alternative architectures for FPGAs. These architectures are mainly aimed to improve area, performance and power consumption of FPGA architectures. Some of these propositions are discussed in this section.

#### 2.6.1 Heterogeneous FPGA Architectures

Use of hard-blocks in FPGAs improves their logic density. Hard-Blocks, in FPGAs increase their density, performance and power consumption. There can be different types of hard-blocks like multipliers, adders, memories, floating point units and DSP blocks etc. In this regard, [19] have incorporated embedded floating-point units in FPGAs, [30] have developed virtual embedded block methodology to model arbitrary embedded blocks on existing commercial FPGAs. Here some of the academic and commercial architectures are presented that make use of hard-blocks to improve overall efficiency of FPGAs.

#### 2.6.1.1 Versatile Packing, Placement and Routing VPR

Versatile Packing, Placement and Routing for FPGAs (commonly known as VPR) [14, 22, 120] is the most widely used academic mesh-based FPGA exploration environment. It allows to explore mesh-based FPGA architectures by employing an empirical approach. Benchmark circuits are mapped, placed and routed on a desired FPGA architecture. Later, area and delay of FPGAs are measured to decide best architectural parameters. Different CAD tools in VPR are highly optimized to ensure high quality results.

Earlier version of VPR supported only homogeneous achitectures [120]. However, the latest version of VPR known as VPR 5.0 [81] supports hard-blocks (such as multiplier and memory blocks) and single-driver routing wires. Hard-blocks are restricted to be in one grid width column, and that column can be composed of only similar type of blocks. The height of a hard-block is quantized and it must be an integral multiple of grid units. In case a block height is indivisible with the height of FPGA, some grid locations are left empty. Figure 2.25 illustrates a heterogeneous FPGA with 8 different kinds of blocks.



#### 2.6.1.2 Madeo, a Framework for Exploring Reconfigurable Architectures

Madeo [73] is another academic design suite for the exploration of reconfigurable architectures. It includes a modeling environment that supports multi-grained, heterogeneous architectures with irregular topologies. Madeo framework initially allows to model an FPGA architecture. The architecture characteristics are represented as a common abstract model. Once the architecture is defined, the CAD tools of Madeo are used to map a target netlist on the architecture. Madeo uses same placement and routing algorithms as used by VPR [120]. Along with placement and routing algorithms, it also embeds a bitstream generator, a netlist simulator, and a physical layout generator in its design suite. Madeo supports architectural prospection and very fast FPGA prototyping. Several FPGAs, including some commercial architectures (such as Xilinx Virtex family) and prospective ones (such as STMicro LPPGA) have been modeled using Madeo. The physical layout is produced as VHDL description.

#### 2.6.1.3 Altera Architecture

Altera's Stratix IV [107] is an example of a commercial architecture that uses a heterogeneous mixture of blocks. Figure 2.26 shows the global architectural layout of Stratix IV. The logic structure of Stratix IV consists of LABs (Logic Array Blocks), memory blocks and digital signal processing (DSP) blocks. LABS are distributed symmetrically in rows and columns and are used to implement general purpose logic. The DSP blocks are used to implement full-precision multipliers of different



Fig. 2.26 Stratix IV architectural elements

granularities. The memory blocks and DSP blocks are placed in columns at equal distance with one another. Input and Output (I/Os) are located at the periphery of architecture.

Logic array blocks (LABs) and adaptive logic modules (ALMs) provide the basic logic capacity for Stratix IV device. They can be used to configure logic functions, arithmetic functions, and register functions. Each LAB consists of ten ALMs, carry chains, arithmetic chains, LAB control signals, local interconnect, and register chain connection lines. The local interconnect connects the ALMs that are inside same LAB. The direct link allows a LAB to drive into the local interconnect of its left or right neighboring LAB. The register chain connects the output of ALM register to the adjacent ALM register in the LAB. A memory LAB (MLAB) is a derivative of LAB which can be either used just like a simple LAB, or as a static random access memory (SRAM). Each ALM in an MLAB can be configured as a  $64 \times 1$ , or  $32 \times 2$  blocks, resulting in a configuration of  $64 \times 10$  or  $32 \times 20$  simple dual-port SRAM block. MLAB and LAB blocks always coexist as pairs in Stratix IV families.

The DSP blocks in Stratix IV are optimized for signal processing applications such as Finite Impulse Response (FIR), Infinite Impulse Response (IIR), Fast Fourier Transform functions (FFT) and encoders etc. Stratix IV device has two to seven columns of DSP blocks that can implement different operations like multiplication, multiply-add, multiply-accumulate (MAC) and dynamic arithmetic or logical shift functions. The DSP block supports different multiplication operations such as  $9 \times 9$ ,  $12 \times 12$ ,  $18 \times 18$  and  $36 \times 36$  multiplication operations. The Stratix IV devices contain three different sizes of embedded SRAMs. The memory sizes include 640-bit memory logic array blocks (MLABs), 9-Kbit M9K blocks, and 144-Kbit M144K blocks. The MLABs have been optimized to implement filter delay lines, small FIFO buffers, and shift registers. M9K blocks can be used for general purpose memory applications, and M144K are generally meant to store code for a processor, packet buffering or video frame buffering.

#### 2.6.2 FPGAs to Structured Architectures

The ease of designing and prototyping with FPGAs can be exploited to quickly design a hardware application on an FPGA. Later, improvements in area, speed, power and volume production can be achieved by migrating the application design from FPGA to other technologies such as Structured-ASICs. In this regard, Altera provides a facility to migrate its Stratix IV based application design to HardCopy IV [56]. Altera gives provision to migrate FPGA-based applications to Structured-ASIC. Their Structured-ASIC is called as HardCopy [56]. The main theme is to design, test and even initially ship a design using an FPGA. Later, the application circuit that is mapped on the FPGA can be seamlessly migrated to HardCopy for high volume production. Their latest HardCopy-IV devices offer pin-to-pin compatibility with the Stratix IV prototype, making them exact replacements for the FPGAs. Thus, the same system board and softwares developed for prototyping and field trials can be retained, enabling the lowest risk and fastest time-to-market for high-volume production. Moreover, when an application circuit is migrated from Stratix IV FPGA prototype to Hardcopy-VI, the core logic performance doubles and power consumption reduces by half.

The basic logic unit of HardCopy is termed as HCell. It is similar to Stratix IV logic cell (LAB) in the sense that the fabric consists of a regular pattern which is formed by tiling one or more basic cells in a two dimensional array. However, the difference is that HCell has no configuration memory. Different HCell candidates can be used, ranging from fine-grained NAND gates to multiplexors and coarse-grained LUTs. An array of such HCells, and a general purpose routing network which interconnects them is laid down on the lower layers of the chip. Specific layers are then reserved to form via connections or metal lines which are used to customize the generic array into specific functionality. Figure 2.27 illustrates the correspondence between an FPGA and a compatible structured ASIC. There is a one to one layout-level correspondence between MRAMs, phase-lock loops (PLLs), embedded memories, transceivers, and I/O blocks. The soft-logic DSP multipliers and logic cell fabric of the FPGA are re-synthesized to structured ASIC fabric. However, they remain functionally and electrically equivalent in FPGAs and HardCopy ASICs.

Apart from Altera, there are several other companies that provide a solution similar to that of Altera. For example, the eASIC Nextreme [41] uses an FPGA-like design flow to map an application design on SRAM programmable LUTs, which are later interconnected through mask programming of few upper routing layers. Tierlogic [113] is a recently launched FPGA vendor that offers 3D SRAM-based TierFPGA devices for prototyping and early production. The same design solution can be frozen to a TierASIC device with one low-NRE custom mask for error-free transition to an ASIC implementation. The SRAM layer is placed on an upper 3D layer of TierFPGA. Once the TierFPGA design is frozen, the bitstream information is used to create a single custom mask metal layer that will replace the SRAM programming layer.



Fig. 2.27 FPGA/Structured-ASIC (HardCopy) Correspondence [59]

#### 2.6.3 Configurable ASIC Cores

Configurable ASIC Core (cASIC) [35] is another example of reconfigurable devices that can implement a limited set of circuits which operate at mutually exclusive times. cASICs are intended as accelerator in domain-specific systems-on-a-chip, and are not designed to replace the entire ASIC-only chip. The host would execute software code, whereas compute-intensive sections can be executed on one or more cASICs. So, to execute the compute intensive sections, cASICs implement only data-path circuits and thus supports full-word blocks only (such as 16-bit wide multipliers, adders, RAMS, etc). Since the application domain of cASICs is more specific, they are significantly smaller than FPGAs. As hardware resources are shared between different netlists, cASICs are even smaller than the sum of the standard-cell based ASIC areas of individual circuits.

#### 2.6.4 Processors Inside FPGAs

Considerable amount of FPGA area can be reduced by incorporating a microprocessor in an FPGA. A microprocessor can execute any less compute intensive task, whereas compute-intensive tasks can be executed on an FPGA. Similarly, a microprocessor based application can have huge speed-up gains if an FPGA is attached with it. An FPGA attached with a microprocessor can execute any compute intensive functionality as a customized hardware instruction. These advantages have compelled commercial FPGA vendors to provide microprocessor in their FPGAs so that complete system can be programmed on a single chip. Few vendors have integrated fixed hard processor on their FPGA (like AVR Processor integrated in Atmel FPSLIC [18] or PowerPC processors embedded in Xilinx Virtex-4 [126]). Others provide soft processor cores which are highly optimized to be mapped on the programmable resources of FPGA. Altera's Nios [90] and Xilinx's Microblaze [88] are soft processor meant for FPGA designs which allow custom hardware instructions. [96] have shown that considerable area gains can be achieved if these soft processors for FPGAs are optimized for particular applications. They have shown that unused instructions in a soft processor can be removed and different architectural tradeoffs can be selected to achieve on average 25% area gain for soft processors required for specific applications. Reconfigurable units can also be attached with microprocessors to achieve execution time speedup in software programs. [28, 70, 104] have incorporated a reconfigurable unit with microprocessors to achieve execution-time speedup.

#### 2.6.5 Application Specific FPGAs

The type of logic blocks and the routing network in an FPGA can be optimized to gain area and performance advantages for a given application domain (controlpathoriented applications, datapath-oriented applications, etc). These types of FPGAs may include different variety of desired hard-blocks, appropriate amount of flexibility required for the given application domain or bus-based interconnects rather than bit-based interconnects. Authors in [83] have presented a reconfigurable arithmetic array for multimedia applications which they call as CHESS. The principal goal of CHESS was to increase arithmetic computational density, to enhance the flexibility, and to increase the bandwidth and capacity of internal memories significantly beyond the capabilities of existing commercial FPGAs. These goals were achieved by proposing an array of ALUs with embedded RAMs where each ALU is 4-bit wide and supports 16 instructions. Similarly, authors in [42] present a coarse-grained, field programmable architecture for constructing deep computational pipelines. This architecture can efficiently implement applications related to media, signal processing, scientific computing and communications. Further, authors in [128] have used bus-based routing and logic blocks to improve density of FPGAs

for datapath circuits. This is a partial multi-bit FPGA architecture that is designed to exploit the regularity that most of the datapath circuits exhibit.

#### 2.6.6 Time-Multiplexed FPGAs

Time-multiplexed FPGAs increase the capacity of FPGAs by executing different portions of a circuit in a time-multiplexed mode [89, 114]. An application design is divided into different sub-circuits, and each sub-circuit runs as an individual context of FPGA. The state information of each sub-circuit is saved in context registers before a new context runs on FPGA. Authors in [114] have proposed a time-multiplexed FPGA architecture where a large circuit is divided into sub-circuits and each sub-circuit is sequentially executed on a time-multiplexed FPGA. Such an FPGA stores a set of configuration bits for all contexts. A context is shifted simply by using the SRAM bits dedicated to a particular context. The combinatorial and sequential outputs of a sub-circuit that are required by other sub-circuits are saved in context registers which can be easily accessed by sub-circuits at different times.

Time-Multiplexed FPGAs increase their capacity by actually adding more SRAM bits rather than more CLBs. These FPGAs increase the logic capacity by dynamically reusing the hardware. The configuration bits of only the currently executing context are active, the configuration bits for the remaining supported contexts are inactive. Intermediate results are saved and then shared with the contexts still to be run. Each context takes a micro-cycle time to execute one context. The sum of the micro-cycles of all the contexts makes one user-cycle. The entire time-multiplexed FPGA or its smaller portion can be configured to (i) execute a single design, where each context runs a sub-design, (ii) execute multiple designs in time-multiplexed modes, or (iii) execute statically only one single design. Tabula [109] is a recently launched FPGA vendor that provides time-multiplexed FPGAs. It dynamically reconfigures logic, memory, and interconnect at multi-GHz rates with a Spacetime compiler.

#### 2.6.7 Asynchronous FPGA Architecture

Another alternative approach that has been proposed to improve the overall performance of FPGA architecture is the use of asynchronous design elements. Conventionally, digital circuits are designed for synchronous operation and in turn FPGA architectures have focused primarily on implementing synchronous circuits. Asynchronous designs are proposed to improve the energy efficiency of asynchronous FPGAs since asynchronous designs offer potentially lower energy as energy is consumed only when necessary. Also the asynchronous architectures can simplify the design process as complex clock distribution networks become unnecessary.

The first asynchronous FPGA was developed by [57]. It consisted the modified version of previously developed synchronous FPGA architecture. Its logic block was

similar to the conventional logic block with added features of fast feedback and a latch that could be used to initialize an asynchronous circuit. Another asynchronous architecture was proposed in [112]. This architecture is designed specifically for dataflow applications. Its logic block is similar to that of synchronous architecture, along with it consists of units such as split unit which enables conditional forwarding of data and a merge unit that allows for conditional selection of data from different sources. An alternative to fully asynchronous design is a globally asynchronous, locally synchronous approach (GALS). This approach is used by [69] where authors have introduced a level of hierarchy into the FPGA architecture. Standard hard or soft synchronous logic blocks are grouped together to form large synchronous blocks and communication between these blocks is done asynchronously. More recently, authors in [131] have applied the GALS approach on Network on Chip architectures to improve the performance, energy consumption and the yield of future architectures in a synergistic manner.

It is clear that, despite each architecture offering its own benefits, a number of architectural questions remain unresolved for asynchronous FPGAs. Many architectures rely on logic blocks similar to those used for synchronous designs [57, 69] and, therefore, the same architectural issues such as LUT size, cluster size, and routing topology must be investigated. In addition to those questions, asynchronous FPGAs also add the challenge of determining the appropriate synchronization methodology.

#### 2.7 Summary and Conclusion

In this chapter initially a brief introduction of traditional logic and routing architectures of FPGAs is presented. Later, different steps involved in the FPGA design flow are detailed. Finally various approaches that have been employed to reduce few disadvantages of FPGAs and ASICs, with or without compromising their major benefits are described. Figure 2.28 presents a rough comparison of different solutions used to reduce the drawbacks of FPGAs and ASICs. The remaining chapters of this book will focus on the exploration of tree-based FPGA architectures using hard-blocks, tree-based application specific Inflexible FPGAs (ASIF), and their automatic layout generation methods.

This book presents new environment for the exploration of tree-based heterogeneous FPGAs. This environment is used to explore different architecture techniques for tree-based heterogeneous FPGA architecture. This book also presents an optimized environment for mesh-based heterogeneous FPGA. Further, the environments of two architectures are evaluated through the experimental results that are obtained by mapping a number of heterogeneous benchmarks on the two architectures.

Altera [11] has proposed a new idea to prototype, test, and even ship initial few designs on an FPGA, later the FPGA based design can be migrated to Structured-ASIC (known as HardCopy). However, migration of an FPGA-based product to Structured-ASIC supports only a single application design. An ASIF retains this



Fig. 2.28 Comparison of different solutions used to reduce ASIC and FPGA drawbacks

property, and can be a possible future extension for the migration of FPGA-based applications to Structured-ASIC. Thus when an FPGA-based product is in the final phase of its development cycle, and if the set of circuits to be mapped on the FPGA are known, the FPGA can be reduced to an ASIF for the given set of application designs. This book presents a new tree-based ASIF and a detailed comparison of tree-based ASIF is performed with mesh-based ASIF. This book also presents automatic layout generation techniques for domain-specific FPGA and ASIF architectures.



http://www.springer.com/978-1-4614-3593-8

Tree-based Heterogeneous FPGA Architectures Application Specific Exploration and Optimization Farooq, U.; Marrakchi, Z.; Mehrez, H. 2012, XVI, 188 p., Hardcover ISBN: 978-1-4614-3593-8



### ■ FLEX 8000 chip contains 26–162 LABs

- Each LAB contains 8 Logic Elements (LEs), so a chip contains 208–1296 LEs, totaling 2,500–16,000 usable gates
- LABs arranged in rows and columns, connected by FastTrack Interconnect, with I/O elements (IOEs) at the edges

### Altera FLEX 8000 Logic Array Block



Figure from Altera technical literature

### LAB = 8 LEs, plus local interconnect, control signals, carry & cascade chains

### **Altera FLEX 8000 Logic Element**



- Each Logic Element (LE) contains:
  - 4-input Look-Up Table (LUT)
    - Can produce any function of 4 variables
  - Programmable flip-flop
    - Can configure as D, T, JR, SR, or bypass
    - Has clock, clear, and preset signals that can come from dedicated inputs, I/O pins, or other LEs
  - Carry chain & cascade chain

### Altera FLEX 8000 Carry Chain (Example: n-bit adder)



*Carry chain* provides very fast (< 1ns) carry-forward between LEs

Feeds both LUT and next part of chain

Good for high-speed adders & counters

47

from

### Altera FLEX 8000 Cascade Chain



Figure from Altera technical literature

# Cascade chain provides wide fan-in

- Adjacent LE's LUTs can compute parts of the function in parallel; cascade chain then serially connects intermediate values
- Can use either a logical AND or a logical OR (using DeMorgan's theorem) to connect outputs of adjacent LEs
- Each additional LE provides 4 more inputs to the width of the function

# Altera FLEX 8000 LE Operating Modes



### Each mode uses LE resources differently

- 7 out of 10 inputs (4 data from LAB local interconnect, feedback from register, and carry-in & cascade-in) go to specific destinations to implement the function
- Remaining 3 provide clock, clear, and preset for register

# Altera FLEX 8000 Operating Modes (cont.)

### Normal mode

- Used for general logic applications, and wide decoding functions that can benefit from the cascade chain
- Arithmetic mode
  - Provides two 3-input LUTs to implement adders, accumulators, and comparators
    - One LUT provides a 3-bit function
    - Other LUT generates a carry bit
- Up/down counter mode
  - Provides counter enable, synchronous up / down control, and data loading options
  - Uses two 3-input LUTs

7

- One LUT generates counter data
- Other LUT generates fast carry bit
- Use 2-to-1 multiplexer for synchronous data loading, clear and preset for

asvnchronous data loading www.Jntufastupdates.com Fall 2004, Lecture 21

### Altera FLEX 8000 FastTrack Interconnect



Note:

(1) See Table 4 for the number of row channels.

### Device-wide rows and columns

- Each LE in LAB drives 2 column (total 16) channels, which connects... that column
- Each LE in LAB drives 1 row channel, which connects to other LABs in that row
  - 3-to-1 muxs connect either LE outputs or column channels to row channels Fall 2004, Lecture 21 www.Jntufastupdates.com

# Altera FLEX 8000 I/O Elements



- Eight I/O Elements (IOEs) are at the end of each row and column
  - Some restrictions on how many row / column channels each IOE connects to
  - Contains a register that can be used for either input or output
    - Associated I/O pins can be used as either input, output, or bidirectional pins

# **Altera FLEX 8000 Configuration**

- Loading the FLEX 8000's SRAM with programming information is called *configuration*, and takes about 100ms
  - After configuration, the device initializes itself (resets its registers, enables its I/O pins, and begins normal operation)
  - Configuration & initialization = command mode, normal operation = user mode
- Six configuration schemes are available:
  - Active serial FLEX gives configuration EPROM clock signals (not addresses), keeps getting new values in sequence
  - Active parallel up, active parallel down FLEX 8000 gives configuration EPROM sequence of addresses to read data from
  - Passive parallel synchronous, passive parallel asynchronous, passive serial passively receives data from some host



### ■ FLEX 8000 chip contains 26–162 LABs

- Each LAB contains 8 Logic Elements (LEs), so a chip contains 208–1296 LEs, totaling 2,500–16,000 usable gates
- LABs arranged in rows and columns, connected by FastTrack Interconnect, with I/O elements (IOEs) at the edges

# Altera FLEX 10K Block Diagram



### FLEX 10K chip contains 72–1520 LABs

- Each LAB contains 8 Logic Elements (LEs), so a chip contains 576–12,160 LEs, totaling 10,000–250,000 usable gates
- Each chip also contains 3–20 Embedded Array Blocks (EABs), which can provide 6,164–40,960 bits of RAM

### Altera FLEX 10K Embedded Array Blocks (EABs)

- Each chip contains 3–20 EABs, each of which can be used to implement either logic or memory
- When used to implement logic, an EAB can provide 100 to 600 gate equivalents (in contrast, a LAB provides 96 g.e.'s)
  - Provides a very large LUT
    - Very fast faster than general logic, since it's only a single level of logic
    - Delay is predictable each RAM block is not scattered throughout the chip as in some FPGAs
  - Can be used to create complex logic functions such as multipliers (e.g., a 4x4 multiplier with 8 inputs and 8 outputs), microcontrollers, large state machines, and DSPs
  - Each EAB can be used independently, or combined to implement larger functions

### Altera FLEX 10K Embedded Array Blocks (cont.)

- Using EABs to implement memory, a chip can have 6K–40K bits of RAM
  - Each EAB provides 2,048 bits of RAM, plus input and output registers
  - Can be used to implement synchronous RAM, ROM, dual-port RAM, or FIFO
  - Each EAB can be configured in the following sizes:
    - 256x8, 512x4, 1024x2, or 2048x1
  - To get larger blocks, combine multiple EABs:
    - Example: combine two 256x8 RAM blocks to form a 256x16 RAM block
    - Example: combine two 512x4 RAM blocks to form a 512x8 RAM block
    - Can even combine all EABs on the chip into one big RAM block
    - Can combine so as to form blocks up to 2048 words without impacting timing

### Altera FLEX 10K Embedded Array Blocks (cont.)



Figure from Altera technical literature

- EAB gets input from a row channel, and can output to up to 2 row channels and 2 column channels
- Input and output buffers are available

### Altera APEX 20K Overview

### ■ APEX 20K chip contains:

- 256–3,456 LABs, each of which contains 10 Logic Elements (LEs), so a chip contains 2,560–51,840 Les, 162,000–2,391,552 usable gates
- 16–216 Embedded System Blocks (EABs), each of which can provide 32,768–442,368 bits of memory
  - Can implement CAM, RAM, dual-port RAM, ROM, and FIFO

### Organization:

- MultiCore architecture, combining LUT, product-terms, & memory in one structure
   Designed for "system on a chip"
- MegaLAB structures, each of which contains 16 LABs, one ESB, and a MegaLAB interconnect (for routing within the MegaLAB)
  - ESB provides product terms <u>or</u> memory

# **APEX LABs and Interconnect**

- Logic Array Block (LAB)
  - 10 LEs
  - Interleaved local interconnect (each LE connects to 2 local interconnect, each local interconnect connects to 10 LEs)
    - Each LE can connect to 29 other Les through local interconnect
- Logic Element (LE)
  - 4-input LUT, carry chain, cascade chain, same as FLEX devices
  - Synchronous and asynchronous load and clear logic
- Interconnect
  - MegaLAB interconnect between 16 LABs, etc. inside each MegaLAB
  - FastTrack row and column interconnect between MegaLABs

# APEX Embedded System Blocks (ESBs)

- Each ESB can act as a macrocell and provide product terms
  - Each ESB gets 32 inputs from local interconnect, from adjacent LAB or MegaLAB interconnect
  - In this mode, each ESB contains 16 macrocells, and each macrocell contains 2 product terms and a programmable register (parallel expanders also provided)
- Each ESB can also act as a memory block (dual-port RAM, ROM, FIFO, or CAM memory) configured in various sizes
  - Inputs from adjacent local interconnect, which can be driven from MegaLAB or FastTrack interconnect
  - Outputs to MegaLAB and FastTrack, some outputs to local interconnect

Code No: R1632043





#### III B. Tech II Semester Supplementary Examinations, November - 2019 **VLSI DESIGN**

(Common to Electronics and Communication Engineering, Electronics

and Computer Engineering)

Time: 3 hours

Max. Marks: 70

(14 Marks)

Note: 1. Question Paper consists of two parts (Part-A and Part-B)

2. Answer ALL the question in Part-A

3. Answer any FOUR Questions from Part-B

#### PART -A

|    |    | $\frac{\mathbf{F}\mathbf{A}\mathbf{K}\mathbf{I}-\mathbf{A}}{\mathbf{I}\mathbf{I}\mathbf{I}\mathbf{I}\mathbf{I}\mathbf{I}\mathbf{I}\mathbf{I}\mathbf{I}\mathbf$                                                                                                 | • Marks) |
|----|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 1. | a) | Why is VLSI design process presented in NMOS only? Justify with an example.                                                                                                                                                                                    | [2M]     |
|    | b) | Give the different scaling models and scaling factors.                                                                                                                                                                                                         | [2M]     |
|    | c) | Explain about Inverter Delays.                                                                                                                                                                                                                                 | [2M]     |
|    | d) | Explain about chip output circuit.                                                                                                                                                                                                                             | [3M]     |
|    | e) | What information from the targeted FPGA device is required in RTL synthesis?                                                                                                                                                                                   | [3M]     |
|    | f) | Explain about Clock Design.                                                                                                                                                                                                                                    | [2M]     |
|    |    | PART –B (56                                                                                                                                                                                                                                                    | 6 Marks) |
| 2. | a) | Derive an equation for $I_{ds}$ of an n-channel Enhancement MOSFET operating in Saturation region.                                                                                                                                                             | [7M]     |
|    | b) | An nMOS transistor is operating in saturation region with the following parameters. $V_{GS} = 5V$ ; $V_{tn} = 1.2V$ ; $W/L = 110$ ; $\mu_n C_{ox} = 110 \ \mu A/V^2$ . Find transconductance of the device.                                                    | [7M]     |
| 3. | a) | Explain about double poly CMOS rules.                                                                                                                                                                                                                          | [7M]     |
|    | b) | Design a layout diagram for CMOS 3-input NAND gate.                                                                                                                                                                                                            | [7M]     |
| 4. | a) | What is meant by sheet resistance $R_s$ ? Explain the concept of $R_s$ applied to MOS transistors.                                                                                                                                                             | [7M]     |
|    | b) | Calculate on resistance of an inverter from VDD to GND. If n- channel sheet resistance $R_{sn}$ =104 $\Omega$ per square and P-channel sheet resistance $R_{sp}$ = 3.5 × 10 <sup>4</sup> $\Omega$ per square. (Z <sub>pu</sub> =4:4 and Z <sub>pd</sub> =2:2). | [7M]     |
| 5. |    | Discuss in detail about Fault types and Models.                                                                                                                                                                                                                | [14M]    |
| 6. | a) | Write down the step by step approach of FPGA design process on XILINX environment.                                                                                                                                                                             | [7M]     |
|    | b) | Design a queue and write the dataflow style VHDL program for the same.                                                                                                                                                                                         | [7M]     |
| 7. |    | Discuss in detail about Low Power CMOS Logic Circuits.                                                                                                                                                                                                         | [14M]    |

\*\*\*\*

### www.manaresults.co.in

Code No: R1632043





#### III B. Tech II Semester Regular/Supplementary Examinations, October/November - 2020 VLSI DESIGN

(Common to Electronics and Communication Engineering, Electronics and

|                                                                                                                                                                                       | Computer Engineering)        |                                                                                                                                                                                                                        |              |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--|--|--|
| Т                                                                                                                                                                                     | Time: 3 hours Max. Marks: 70 |                                                                                                                                                                                                                        |              |  |  |  |
| <ul> <li>Note: 1. Question Paper consists of two parts (Part-A and Part-B)</li> <li>2. Answer ALL the question in Part-A</li> <li>3. Answer any FOUR Questions from Part-B</li> </ul> |                              |                                                                                                                                                                                                                        |              |  |  |  |
|                                                                                                                                                                                       |                              | <u>PART –A</u> (14 I                                                                                                                                                                                                   | Marks)       |  |  |  |
| 1.                                                                                                                                                                                    | a)                           | Describe the ION-IMPLANTATION steps in IC fabrication.                                                                                                                                                                 | [2M]         |  |  |  |
|                                                                                                                                                                                       | b)                           | Write a short note on MOS layers and symbolic diagram translation to MASK form.                                                                                                                                        | [2M]         |  |  |  |
|                                                                                                                                                                                       | c)                           | What are the sources of wiring capacitances?                                                                                                                                                                           | [2M]         |  |  |  |
|                                                                                                                                                                                       | d)                           | Define the Controllability.                                                                                                                                                                                            | [3M]         |  |  |  |
|                                                                                                                                                                                       | e)                           | Define synthesis and explain its importance.                                                                                                                                                                           | [3M]         |  |  |  |
|                                                                                                                                                                                       | f)                           | What is a deep submicron digital IC design?                                                                                                                                                                            | [2M]         |  |  |  |
|                                                                                                                                                                                       |                              | <u>PART –B</u> (56 I                                                                                                                                                                                                   | Marks)       |  |  |  |
| 2.                                                                                                                                                                                    | a)                           | Derive the relationship between drain to source current $I_{ds}$ versus drain to source voltage $V_{ds}$ in a non-saturated and a saturated region.                                                                    | [7M]         |  |  |  |
|                                                                                                                                                                                       | b)                           | What are the steps involved in the NMOS fabrication? Explain with neat sketches.                                                                                                                                       | [7M]         |  |  |  |
| 3.                                                                                                                                                                                    | a)<br>b)                     | What is a stick diagram? Draw the stick diagram and layout for a CMOS inverter. Explain about double poly CMOS rules.                                                                                                  | [8M]<br>[6M] |  |  |  |
| 4.                                                                                                                                                                                    | a)                           | What is meant by sheet resistance( $R_s$ )? Explain the concept of $R_s$ applied to MOS transistors.                                                                                                                   | [7M]         |  |  |  |
|                                                                                                                                                                                       | b)                           | Calculate the resistance of an inverter from VDD to GND. If n-channel sheet resistance $R_{sn}$ =104 $\Omega$ per square and P-channel sheet resistance $R_{sp}$ = 3.5×104 $\Omega$ per square. (Zpu=4:4 and Zpd=2:2). | [7M]         |  |  |  |
| 5.                                                                                                                                                                                    | a)                           | Draw the circuit diagram of the Built-In Self Test(BIST) circuit and explain its operation.                                                                                                                            | [7M]         |  |  |  |
|                                                                                                                                                                                       | b)                           | List out the different fault types that occurred in VLSI circuits and explain each fault in detail.                                                                                                                    | [7M]         |  |  |  |
| 6.                                                                                                                                                                                    | a)<br>b)                     | Draw and explain the FPGA design flow.<br>Explain the step-by-step approach of the FPGA design process in the Xilinx environment.                                                                                      | [7M]<br>[7M] |  |  |  |
| 7.                                                                                                                                                                                    | a)<br>b)                     | Explain the concept of Low-power design through voltage scaling in detail.<br>Write short notes on the following terms:<br>i) Interconnect Design, and ii) Power Grid.                                                 | [7M]<br>[7M] |  |  |  |

\*\*\*\*



(**Common to** Electronics and Communication Engineering, Electronics and Instrumentation Engineering, Electronics and Computer Engineering)

Time: 3 hours Max. Marks: 70 Note: 1. Question Paper consists of two parts (Part-A and Part-B) 2. Answer ALL the question in Part-A 3. Answer any FOUR Questions from Part-B PART –A Write down the equations for I<sub>ds</sub> of an n-channel enhancement MOSFET operating in 1. a) [2M] Non-saturated region and saturated region. Define stick diagram and layout diagram. [2M] b) Explain about the constraints in choice of layers. c) [2M] Mention the common techniques involved in ad-hoc testing. d) [3M] What information from the targeted FPGA device is required in RTL synthesis? e) [3M] f) Explain about clock skew. [2M] PART-B 2. Explain the nMOS enhancement mode fabrication process for different conditions of a) [7M]  $V_{ds}$ . Derive an expression for transconductance of an n-channel enhancement MOSFET b) [7M] operating in active region. 3. a) Draw a stick diagram and layout for two input CMOS NAND gate indicating all the [7M] regions and layers. b) Explain 2 µm Double Metal, Double Poly CMOS / BiCMOS Rules. [7M] Explain the issues involved in driving large capacitor loads in VLSI circuit regions. 4. a) [7M] Calculate the gate capacitance value of 5 mm technology minimum size transistor with b) [7M] gate to channel value is  $4 \times 10^{-4} \text{ pF/mm}^2$ . 5. a) Explain about the following types of faults with suitable example: [7M] (ii) Bridge faults (i) stuck at faults (iii) temporary faults Explain the different categories of DFT techniques. b) [7M] 6. a) Write down the step by step approach for FPGA design process on XILINX [7M] environment? Draw and explain the basic architecture of FPGA. b) [7M] 7. Explain about deep submicron processes with suitable schematic diagrams. a) [7M] Explain about the scaling limitation for low voltage, low power design. Give the effect b) [7M] of scaling on various MOSFET parameters with necessary equations.

\*\*\*\*\*



(**Common to** Electronics and Communication Engineering, Electronics and Instrumentation Engineering, Electronics and Computer Engineering)

|    |      | Engineering, Electronics and Computer Engineering)                                                                                                                                    |      |
|----|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
|    | Time | : 3 hours Max. Marks                                                                                                                                                                  | : 70 |
|    |      | <ul> <li>Note: 1. Question Paper consists of two parts (Part-A and Part-B)</li> <li>2. Answer ALL the question in Part-A</li> <li>3. Answer any FOUR Questions from Part-B</li> </ul> |      |
|    |      | <u>PART –A</u>                                                                                                                                                                        |      |
| 1. | a)   | Explain the terms SSI, LSI, and VLSI with the number of transistors per chip and applications.                                                                                        | [2M] |
|    | b)   | Draw the stick diagram for CMOS Inverter.                                                                                                                                             | [2M] |
|    | c)   | What is sheet resistance? Derive the Expression for R <sub>s</sub> ?                                                                                                                  | [2M] |
|    | d)   | What are the approaches in design for testability?                                                                                                                                    | [3M] |
|    | e)   | Explain synthesis process.                                                                                                                                                            | [3M] |
|    | f)   | What are the different types of power consumption?                                                                                                                                    | [2M] |
|    |      | PART -B                                                                                                                                                                               |      |
| 2. | a)   | Explain in detail the p-well process for CMOS fabrication indicating the masks used.                                                                                                  | [7M] |
|    | b)   | Compare the relative merits of three different forms of pull-up for an inverter circuit.<br>What is the best choice for realization in nMOS and CMOS technology?                      | [7M] |
| 3. | a)   | What are the $\lambda$ -based design rules? Give them for each layer.                                                                                                                 | [7M] |
|    | b)   | Draw a stick diagram for CMOS logic $Y = (A+B+C)'$ .                                                                                                                                  | [7M] |
| 4. | a)   | What is inverter delay? How delay is calculated for multiple stages? Explain.                                                                                                         | [7M] |
|    | b)   | Two nMOS inverters are cascaded to drive a capacitive load $C_L=16C_g$ . Calculate pair delay $V_{in}$ to $V_{out}$ in terms of $\tau$ .                                              | [7M] |
| 5. | a)   | What are the different faults found in combinational circuits? How can they be categorized?                                                                                           | [7M] |
|    | b)   | Briefly discuss about Built-In-Self Test technique with a suitable diagram.                                                                                                           | [7M] |
| 6. | a)   | Give the steps in FPGA design flow with flow diagram and briefly discuss about each step.                                                                                             | [7M] |
|    | b)   | Explain about the principle and operation of FPGAs. What are its applications?                                                                                                        | [7M] |
| 7. | a)   | Discuss about the various problems associated with low voltage VLSI circuit design.                                                                                                   | [7M] |
|    | b)   | Explain about estimation and optimization of switching activity.                                                                                                                      | [7M] |

\*\*\*\*

### ["]"]["]["][]www.manaresults.co.in



(Common to Electronics and Communication Engineering, Electronics and Instrumentation Engineering, Electronics and Computer Engineering)

|        |       | Engineering, Electronics and Computer Engineering)                                                    |        |
|--------|-------|-------------------------------------------------------------------------------------------------------|--------|
| -<br>- | Time: | 3 hours Max. Mar                                                                                      | ks: 70 |
| _      |       | Note: 1. Question Paper consists of two parts (Part-A and Part-B)                                     |        |
|        |       | 2. Answer ALL the question in Part-A                                                                  |        |
|        |       | 3. Answer any FOUR Questions from Part-B                                                              |        |
|        |       | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                |        |
| 1.     | a)    | Define Moore's law.                                                                                   | [2M]   |
|        | b)    | Draw a symbolic layout of a two-input NAND gate.                                                      | [3M]   |
|        | c)    | Give the scaling factor for Maximum operating frequency $(f_0)$ in terms of different scaling models. | [2M]   |
|        | d)    | What is meant by observability?                                                                       | [3M]   |
|        | e)    | What are FPGAs?                                                                                       | [2M]   |
|        | f)    | What is switching activity?                                                                           | [2M]   |
|        |       | PART -B                                                                                               |        |
| ~      | ``    |                                                                                                       |        |

- 2. a) [7M]
- Compare BiCMOS technology with other Technologies. Calculate  $I_D$  and  $V_{DS}$  if  $k_n = 100 \ \mu A/v^2$ ,  $V_{tn} = 0.6V$  and W/L = 3 for transistor  $M_1$ , in b) [7M] the circuit shown below:



| 3. | a)       | Explain with suitable examples how to design the layout of a Gate to maximize performance and minimize area.                                    | [7M]         |
|----|----------|-------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|    | b)       | Design a stick diagram for nMOS logic $Y = (A+B+C)'$ .                                                                                          | [7M]         |
| 4. | a)       | How does depletion regions around source and drain are affected due to scaling down of device dimensions? Explain.                              | [7M]         |
|    | b)       | Derive the expression for propagation delay in the case of cascaded pass transistors.                                                           | [7M]         |
| 5. | a)<br>b) | Define the terms 'failure' and 'fault'. Discuss the different fault models.<br>Briefly discuss about On-Chip clock generation and distribution. | [7M]<br>[7M] |
| 6. | a)       | Explain the following terms:<br>(i) LUT (ii) CLB (iii) IOB (iv) Switch matrix                                                                   | [8M]         |
|    | b)       | List out the various FPGA families. Explain how they are different from each other?                                                             | [6M]         |
| 7. | a)       | Explain about the design limitations imposed on low power, low voltage circuits pertaining to the scaling and inter connect wires.              | [7M]         |
|    | b)       | Briefly discuss about the different techniques for reduction of switching capacitance.                                                          | [7M]         |

\*\*\*\*



(**Common to** Electronics and Communication Engineering, Electronics and Instrumentation Engineering, Electronics and Computer Engineering)

|    |                | Engineering, Electronics and Computer Engineering)                                                                                                                                                     |                      |
|----|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
|    | Time           | : 3 hours Max. M                                                                                                                                                                                       | Iarks: 70            |
|    |                | <ul> <li>Note: 1. Question Paper consists of two parts (Part-A and Part-B)</li> <li>2. Answer ALL the question in Part-A</li> <li>3. Answer any FOUR Questions from Part-B</li> </ul>                  |                      |
|    |                |                                                                                                                                                                                                        |                      |
| 1. | a)<br>b)<br>c) | Discuss the microelectronics evolution.<br>What is Vias? How to construct it in layout?<br>What is the need of scaling in MOS circuits?                                                                | [3M]<br>[2M]<br>[2M] |
|    | d)             | Explain how function of system can be tested.                                                                                                                                                          | [2M]                 |
|    | e)             | List out the commercially available FPGAs.                                                                                                                                                             | [3M]                 |
|    | f)             | What is the need of interconnect?                                                                                                                                                                      | [2M]                 |
|    |                | PART -B                                                                                                                                                                                                |                      |
| 2. | a)             | What are the additional two layers in BiCMOS technology compared to others? With neat sketches explain BiCMOS fabrication process.                                                                     | [7M]                 |
|    | b)             | Show that the switching speed of an enhancement MOSFET varies inversely as the square of the channel length.                                                                                           | [7M]                 |
| 3. | a)             | Give the design rules for the following cases with neat sketches:(i) Polysilicon – polysilicon(ii) n-type diffusion – n-type diffusion(iii) n-type diffusion – p-type diffusion(iv) metal 1 – metal 2. | [8M]                 |
|    | b)             | Design a stick diagram for two input pMOS NAND and NOR gates.                                                                                                                                          | [6M]                 |
| 4. |                | Describe the following briefly<br>(i) Cascaded inverters as drivers (ii) Super buffers (iii) BiCMOS drivers                                                                                            | [14M]                |
| 5. | a)<br>b)       | Explain the terms controllability, observability and fault coverage.<br>With suitable diagrams, explain the Scan based test techniques.                                                                | [7M]<br>[7M]         |
| 6. | a)<br>b)       | List out the different configuration modes in FPGA. Briefly discuss about it.<br>How the pass transistors are used to connect wire segments for the purpose of FPGA programming? Explain.              | [7M]<br>[7M]         |
| 7. | a)             | What is the different technical parameter issues connected with VLSI low power and low voltage design? Explain.                                                                                        | [7M]                 |
|    | b)             | With schematic diagrams explain about deep submicron processes.                                                                                                                                        | [7M]                 |

\*\*\*\*\*

Code No: **RT41028** 



Set No. 1

### IV B.Tech I Semester Supplementary Examinations, February/March - 2018

VLSI DESIGN

(Electrical and Electronics Engineering)

Time: 3 hours

Max. Marks: 70

#### Question paper consists of Part-A and Part-B Answer ALL sub questions from Part-A Answer any THREE questions from Part-B \*\*\*\*\*

### PART-A (22 Marks)

| 1. | a) | Write the limitations of IC's.                     | [3] |
|----|----|----------------------------------------------------|-----|
|    | b) | Write the problems of Latch-up in CMOS circuits.   | [4] |
|    | c) | What is the need of stick diagrams?                | [3] |
|    | d) | Explain about the constraints in choice of layers. | [4] |
|    | e) | List out the limitations of scaling.               | [4] |
|    | f) | Write the history of VHDL in brief.                | [4] |
|    |    |                                                    |     |

#### **<u>PART-B</u>** (3x16 = 48 Marks)

| 2. | a)<br>b) | Why do we use NMOS technology in the design of integrated circuit?<br>With neat sketches explain how NPN transistors are fabricated in Bipolar | [8] |
|----|----------|------------------------------------------------------------------------------------------------------------------------------------------------|-----|
|    |          | process.                                                                                                                                       | [8] |
| 3. | a)       | Define the term threshold voltage of MOSFET and explain its significance.                                                                      | [8] |
|    | b)       | Explain latch-up problem in CMOS circuits.                                                                                                     | [8] |
| 4. | a)       | Explain procedure for drawing the stick diagram for nMOS design style                                                                          | [8] |
|    | b)       | Explain in brief about the general observations on the design rules.                                                                           | [8] |
| 5. | a)       | Define and explain the standard unit of capacitance.                                                                                           | [8] |
|    | b)       | Define fan-in and fan-out. Explain their effects on propagation delay.                                                                         | [8] |
| 6. | a)       | Discuss the limits due to sub threshold currents.                                                                                              | [8] |
|    | b)       | Explain clocked CMOS logic and domino logic.                                                                                                   | [8] |
| 7. | a)       | Discuss the hardware synthesis process.                                                                                                        | [8] |
|    | b)       | Classify and explain the digital simulation method.                                                                                            | [8] |

1 of 1

### WWW.MANARESULTS.CO.IN

Code No: **RT41028** 



Set No. 1

IV B.Tech I Semester Regular/Supplementary Examinations, Oct/Nov - 2018

**VLSI DESIGN** 

(Electrical and Electronics Engineering)

Time: 3 hours

Max. Marks: 70

Question paper consists of Part-A and Part-B Answer ALL sub questions from Part-A Answer any THREE questions from Part-B \*\*\*\*\*

### PART-A (22 Marks)

| 1. | a)       | What is the size of silicon wafer used for manufacturing state-of-the art VLSI                                               | F 43       |
|----|----------|------------------------------------------------------------------------------------------------------------------------------|------------|
|    | 1 \      | IC`s? Explain why?                                                                                                           | [4]        |
|    | b)       | Define Figure of Merit with the necessary expression.                                                                        | [3]        |
|    | c)       | Design a layout diagram for two input nMOS NAND gate.<br>List and explain the three sources of wiring capacitances.          | [4]        |
|    | d)<br>e) | What are the effects of scaling on $V_t$ ?                                                                                   | [3]        |
|    | e)<br>f) | Define library. Give the syntax of signal assignment statement.                                                              | [4]<br>[4] |
|    |          | <b>PART–B</b> $(3x16 = 48 Marks)$                                                                                            |            |
| 2. | a)       | Explain the MOS transistor operation with the help of neat sketches in the Depletion mode.                                   | [8]        |
|    | b)       | Discuss the steps involved in BiCMOS technology.                                                                             | [8]        |
| 3. | a)       | Clearly explain body effect of MOSFET.                                                                                       | [8]        |
|    | b)       | Design and draw the circuit diagram of an nMOS inverter and explain its operation with the help of transfer characteristics. | [8]        |
| 4. | a)       | Design a stick diagram for two input nMOS NOR Gate.                                                                          | [8]        |
|    | b)       | Discuss the transistor related design rule (orbit 2µm CMOS).                                                                 | [8]        |
| 5. | a)       | Explain in detail about formal estimation of CMOS inverter delay.                                                            | [8]        |
|    | b)       | Discuss nMOS transistor as a switch.                                                                                         | [8]        |
| 6. | a)       | Explain the limits of miniaturization on scaling.                                                                            | [8]        |
|    | b)       | In gate logic, compare the geometric aspects between two-input nMOS NAND and CMOS NAND gates.                                | [8]        |
| 7. | a)       | List the various abstraction levels in VHDL. Explain any one of them.                                                        | [8]        |
|    | b)       | Write in brief about logic synthesis process.                                                                                | [8]        |

1 of 1

### WWW.MANARESULTS.CO.IN

|"|""||"||